Abstract
Workflow to generate phyloseq objects, analyze data, and create figures for the fish microbiome paper, processed using the previous DADA2 workflow. Here we report both methods and results in plain text and R code. The document represents a completely reproducible workflow of our study.All you need to do to run this workflow is to install the appropriate R packages and download the following file:
We also made several additional data products and processing scripts available. * https://doi.org/10.6084/m9.figshare.7357178: DOI for this workflow. * https://doi.org/10.6084/m9.figshare.6875522: Raw data for each sample (before removing primers). * Trimmed data (primers removed) are deposited at the European Nucleotide Archive under the study accession number PRJEB28397 (ERP110594). * https://doi.org/10.6084/m9.figshare.6997253: DOI for the DADA2 processing workflow.
This document is interactive. You can sort and scroll through most of the tables and the phylogenetic tree is zoomable. In the upper right hand corner of the front page is a Code button. Use this to show or hide all the code in the document (default is hide) as well as download the .Rmd file which you can use to extract the code. Let’s proceed.
Important Definitions & Abbreviations
Our goals of this study were to:
Before we proceed with microbial community analysis, we will run some analyses on field-based behavioral assays of the different herbivorous reef fish species.
In this first part we go through the steps of defining sample groups, creating phyloseq objects, removing unwanted samples, and removing contaminant ASVs. Various parts of this section can easily be modified to perform different analyses. For example, if you were only interested in a specific taxa or group of samples, you could change the code here to create new phyloseq objects.
In the second part, we assess taxonomic composition as well as alpha and beta diversity. Phyloseq offers many options for assessing diversity, including several alpha diversity metrics, additional ordination and distance methods, and so on. You can play around with these settings to how it affects the results.
We wanted to understand how ASVs partitioned across host species. We also wanted to assess the specificity of each ASV to determine habitat preference. To our knowledge there is no quantitative way to do this. The only attempt we are aware of was MetaMetaDB but it is based on a 454 database and no longer seems to be in active development. So we used an approach based on the work of Sullam et. al., first identifying differentially abundant ASVs, then searching for closest database hits, and finally using phylogenetic analysis and top hit metadata (isolation source, natural host) to infer habitat preference.
In this section we pull together the results from Part III and try to make sense of the microbiomes from these herbivorous reef fish. How are ASVs partitioning across host? How similar are these ASVs to sequences from other studies? What can these patterns tell us about host specificity?
All tables and figures presented below are named as they appeared in the original publication. We also include many additional data productes that were not part of the original publication.
Throughout this workflow we are going to rely on color to help us tell our story. We will use color to delineate host fish species, but more importantly, to delineate microbial taxa. Microbial diversity is pretty vast and it can be difficult to display all of this diversity in a single, static figure. Additionally, many of us have a decreased ability to see color or differences in color. So it is not only important to use relatively few colors but also a color blind friendly palette. For our figures, we generated a palettes based on Bang Wong’s scheme described in this paper. Wong’s color scheme uses contrasting colors that can be distinguished by folks with color vision deficiency—roughly 8% of people (mostly males) are color blind. Do want Keanu Reeves to understand your figures or not?
This scheme is conservative—there are only 7 colors. We added black and grey to give us a little wiggle room. Others have developed 12 and 15 color palette schemes, but be careful—figures with too many colors can inhibit our ability to discern patterns. Our conservative palette forces us to choose carefully when deciding which taxa to target or how many groups to display. Here we will create two palettes—one for microbial taxa with all the colors and another for the five host fish species. The latter is just a subset of the full palette. Here is the code:
#Full palette
friend_pal <- c("#009E73", "#D55E00", "#F0E442", "#CC79A7", "#56B4E9",
"#E69F00", "#0072B2", "#7F7F7F", "#000000", "#FFFFFF")
#Fish palette
samp_pal <- c("#CC79A7", "#0072B2", "#009E73", "#56B4E9", "#E69F00")To characterize the foraging ecology of the different species of herbivorous fishes, we characterized in detail individual bites by each species at three sites in the Florida Keys (Conch, French, Molasses reefs) during the Boreal summers of 2014 and 2016. At each site, we haphazardly selected focal fish over a wide range of sizes and then randomly selected a single bite by each individual to describe (see Supplementary Table 2 for sample sizes). For each bite, we identified the food item(s) targeted as well as characteristics of the substrate (e.g., hard bottom vs. other common substrates such as sponges, gorgonians, etc.) at the precise location of the bite. For hard substrates, we recorded whether a bite was on a convex, concave, or flat surface, and whether that surface was oriented horizontally (< 45 degrees) or vertically (> 45 degrees). In addition, we framed each bite within a 5 x 5 cm micro-quadrat and measured the depth of the sediment and height of the algae at several points to determine the average sediment depth and algal height within the vicinity of the bite36. We then manually removed sediments and determined whether the fish left a distinct grazing scar (i.e., where calcium carbonate had been removed from the reef framework in addition to epilithic algae).
To visualize the multivariate patterns of herbivory we used non-metric multidimensional scaling (NMDS) as implemented via the metaMDS function in the vegan package. For each species of herbivorous fish at each site we calculated the proportion of bites focused on each prey item (‘prey variables’) as well as the proportion of bites targeting substrates with different characteristics (e.g., convex vs. concave vs. flat). We also calculated the proportion of bites resulting in a grazing scar. For bites on turf assemblages, we calculated the mean turf height and sediment depth directly adjacent to each bite. Prior to analysis, quantitative variables (e.g., sediment depth and turf height) were rescaled to the range of 0 to 1. In addition, quantitative and categorical variables were rescaled such that they would have similar influence to the ‘prey variables’ by dividing each variable by the number of prey categories. Rescaled data were then analysed via NMDS using a Bray-Curtis dissimilarity matrix.
First, we read in and display the table describing the foraging ecology of these fish.
library(ggthemes)
all_traits <- read.csv("MANUAL_INPUT/Mean_bite_characteristics.txt", header = TRUE, sep = "\t")
ids <- all_traits[,1:2]
write.table(all_traits, "R_OUTPUT/Supplementary_Table_1.txt", sep = "\t",
row.names = FALSE, quote = FALSE)
datatable(all_traits, rownames = FALSE, width = "100%", colnames = c(
"Host species", "Site", "Sediment depth (mm)", "Turf height (mm)",
"Prop mark on substrate", "Vertical_substrate", "Prop concave",
"Prop convex", "Articulated coralline", "CCA", "Dictyota", "epiphytes",
"Gorgonian", "Halimeda", "Laurencia", "sponge", "Stypopodium", "turf"),
caption = htmltools::tags$caption(
style = "caption-side: bottom; text-align: left;",
"Supplementary Table 1: ",
htmltools::em("INSERT DESCRIPTION.")),
extensions = "Buttons",
options = list(columnDefs = list(list(className = "dt-left", targets = 0)),
dom = "Blrtip", pageLength = 5, lengthMenu = c(5, 10, 15),
buttons = c("csv", "copy"), scrollX = TRUE, scrollCollapse = TRUE))Next we standardize the variables so that they have similar weights. First we rescale quantitative traits to be in the range 0 to 1 and divide by the number of diet categories to have similar influence as the diet variables. Next we rescale all ‘non diet’ traits to have similar influence to the diet traits by dividing by the number of diet categories divided by the number of categories for each substrate characteristic. Lastly, we combine these data into a single data frame for NMDS analysis.
#First
quant_traits_std <- decostand(all_traits[,3:4], 'range')/10
#Second
Mean_prop_mark_on_substrate_std <- all_traits[,5]/(10/2)
prop_vertical_std <- all_traits[,6]/(10/2)
prop_concave_std <- all_traits[,7]/(10/3)
prop_convex_std <- all_traits[,8]/(10/3)
all_traits_std <- cbind(quant_traits_std,
Mean_prop_mark_on_substrate_std,
prop_vertical_std, prop_concave_std,
prop_convex_std, all_traits[,9:18])Now we conduct the NMDS analysis of the standardized diet data.
nmds <- metaMDS(all_traits_std, distance = "bray", k = 3, trymax = 40)Then we can inspect the Sheppard plot and save the results as a dataframe.
stressplot(nmds)#plot(nmds, type = "t")
NMDS <- data.frame(NMDS1 = nmds$points[,1], NMDS2 = nmds$points[,2], MDS3 = nmds$points[,3])Finally, we generate the environmental vectors (correlations of variables with ordination axes).
set.seed(1)
vec.sp <- envfit(nmds$points, all_traits[,3:18], perm=1000, choices = c(1,2,3))
spp.scrs <- as.data.frame(scores(vec.sp, display = "vectors"))
spp.scrs$species <- rownames(spp.scrs)
colnames(spp.scrs) <- c("S_MDS1", "S_MDS2", "SMDS3", "species")Now we can plot the results and generate Figure 1 from the manuscript. For all figures generated in this workflow, we coded as much formatting as we could with R and then made minor stylistic changes in Inskscape. Thus, what you see here and throughout, are the raw figures.
#Plot results using ggplot2
#Merge with data that we want to use in plotting
NMDS <- cbind(ids, NMDS)
stuff <- ggplot(NMDS) +
geom_point(mapping = aes(x = NMDS1, y = NMDS2, shape = Site, colour = Species), size =4)+
geom_segment(aes(x=0,y=0,xend=S_MDS1, yend=S_MDS2), data = spp.scrs,
arrow = arrow(length = unit(0.5, "cm")),colour="black",inherit.aes=FALSE)+
geom_text(data=spp.scrs,aes(x=S_MDS1,y=S_MDS2,label=species),size=3)+
labs(title = "bray distance on all traits (standardized)", subtitle = "Stress = 0.04")
#####Now subset just the spp. scrs with the highest correlations
spp.scrs_sub <- spp.scrs[-c(5,11,12),]
#Categorize species scores into different variable types for plotting
variable_type <- as.factor(c(rep("substrate", 5), rep("algae",8)))
spp.scrs_sub <- cbind(variable_type, spp.scrs_sub)
spp.scrs_substrate <- subset(spp.scrs_sub, spp.scrs_sub$variable_type == "substrate")
spp.scrs_algae <- subset(spp.scrs_sub, spp.scrs_sub$variable_type == "algae")Fig1 <- ggplot(NMDS) +
geom_point(mapping = aes(x = NMDS1, y = NMDS2, shape = Site, colour = Species), size = 4)+
scale_colour_manual(values = samp_pal) +
geom_segment(aes(x=0, y=0, xend=S_MDS1, yend=S_MDS2), data = spp.scrs_substrate,
arrow = arrow(length = unit(0.5, "cm")), color = "red") +
geom_text(data = spp.scrs_substrate, aes(x = S_MDS1, y = S_MDS2, label = species),
nudge_y = c(-0.025, -0.025, -0.05, 0.05, 0.05)) +
geom_segment(aes(x=0,y=0,xend=S_MDS1, yend=S_MDS2), data = spp.scrs_algae,
arrow = arrow(length = unit(0.5, "cm")), color = "black") +
geom_text(data = spp.scrs_algae, aes(x = S_MDS1, y = S_MDS2, label = species),
nudge_y = c(-0.025, 0.025, 0.025, 0.025, 0.025, -0.025, -0.05, -0.025)) +
theme_classic(base_size = 15)
Fig1 <- Fig1 + coord_fixed()
Fig1Figure 1: Bray curtis distance on all traits; stress = 0.04
pdf("R_OUTPUT/Figure_1.pdf")
Fig1
invisible(dev.off())0k, on to the microbial data. First, we load the data packet produced by the final step of the DADA2 workflow, format sample names, and define groupings. We will use the sample names to define the different groups.
Groups
load("DADA2_DATA/combo_pipeline.rdata")
samples.out <- rownames(seqtab)
subject <- sapply(strsplit(samples.out, "[[:digit:]]"), `[`, 1)
# this splits the string at first instance of a digit
sample_name <- substr(samples.out, 1, 999) # use the whole string for individuals
genus <- substr(samples.out, 1, 2) # use the first two letters for genus
species <- substr(samples.out, 1, 5) # use the next three letters for species
sample_name## [1] "AcCoe01" "AcCoe02" "AcCoe03" "AcCoe04" "AcCoe05" "AcCoe06" "AcCoe07"
## [8] "AcCoe08" "AcTra01" "AcTra02" "AcTra03" "AcTra04" "AcTra05" "AcTra06"
## [15] "AcTra07" "AcTra08" "AcTra09" "ScTae01" "ScTae02" "ScTae03" "ScTae04"
## [22] "ScTae05" "ScTae06" "ScTae07" "ScTae08" "ScTae09" "ScVet01" "ScVet02"
## [29] "SpAur01" "SpAur02" "SpAur03" "SpAur04" "SpAur10" "SpAur11" "SpAur12"
## [36] "SpAur13" "SpChr01" "SpVir01" "SpVir02" "SpVir03" "SpVir04" "SpVir05"
## [43] "SpVir06" "SpVir07" "SpVir08" "SpVir09" "SpVir10" "SpVir11" "SpAur05"
## [50] "SpAur06" "SpAur07" "SpAur08" "SpAur09"
unique(genus)## [1] "Ac" "Sc" "Sp"
unique(species)## [1] "AcCoe" "AcTra" "ScTae" "ScVet" "SpAur" "SpChr" "SpVir"
And finally we define a sample data frame that holds the different groups we extracted from the sample names. On the right are a few samples and their different groups names.
#define a sample data frame
samdf <- data.frame(SamName = sample_name, Gen = genus, Sp = species)
rownames(samdf) <- samples.out
kable(samdf[c(1, 13, 20, 30, 44),1:3], row.names = FALSE) %>%
kable_styling(bootstrap_options = "striped", full_width = FALSE, position = "float_right") %>%
column_spec(1:3, width = "3.5cm")| SamName | Gen | Sp |
|---|---|---|
| AcCoe01 | Ac | AcCoe |
| AcTra05 | Ac | AcTra |
| ScTae03 | Sc | ScTae |
| SpAur02 | Sp | SpAur |
| SpVir07 | Sp | SpVir |
Abbreviations:
Next we create a phyloseq (ps) object with the Silva (slv) taxonomy. There is also a Greengenes (gg) annotation in the output file from DADA2 which can be used instead of the Silva annotation. Just change tax_silva to tax_gg. At this point we rename the amplicon sequence variants (ASVs) so the designations are a bit more user friendly. By default, DADA2 names each ASV by its unique sequence so that data can be directly compared across studies (which is great). But this convention can get cumbersome downstream, so we rename the ASVs using a simpler convention—ASV1, ASV2, ASV3, and so on, while retaining the exact sequences.
ps_slv <- phyloseq(otu_table(seqtab, taxa_are_rows = FALSE), sample_data(samdf),
tax_table(tax_silva)) # this create the phyloseq object
tax_table(ps_slv) <- cbind(tax_table(ps_slv), rownames(tax_table(ps_slv)))
taxa_names(ps_slv) <- paste0("ASV", seq(ntaxa(ps_slv))) # adding unique ASV names
tax_table(ps_slv) <- cbind(tax_table(ps_slv), rownames(tax_table(ps_slv)))
head(taxa_names(ps_slv))## [1] "ASV1" "ASV2" "ASV3" "ASV4" "ASV5" "ASV6"
ps_slv## phyloseq-class experiment-level object
## otu_table() OTU Table: [ 12479 taxa and 53 samples ]
## sample_data() Sample Data: [ 53 samples by 3 sample variables ]
## tax_table() Taxonomy Table: [ 12479 taxa by 8 taxonomic ranks ]
And then we add two final columns with the actual ASV sequences and ASV IDs. This will be useful later when trying to export a fasta file.
colnames(tax_table(ps_slv)) <- c("Kingdom", "Phylum", "Class", "Order",
"Family", "Genus", "ASV_SEQ", "ASV_ID")At this point we have a completely unadulterated phyloseq object because it contains all the ASVs and all samples. Lets export the sequence and taxonomy tables for posterity sake.
write.table(tax_table(ps_slv), "R_OUTPUT/full_tax_table.txt", sep="\t", quote = FALSE, col.names=NA)
write.table(t(otu_table(ps_slv)), "R_OUTPUT/full_seq_table.txt", sep="\t", quote = FALSE, col.names=NA)
write.table(sample_data(ps_slv), "R_OUTPUT/full_sample_data.txt", sep="\t", quote = FALSE, row.names = FALSE)Remember three of these samples were omitted because we did not have replicates for the host species. Lets remove those samples. The only way we could figure out how to do this was by selecting the samples we wanted to keep. If you want to change the group of samples, modify the script accordingly.
ps_slv_base <- prune_samples(c("SpAur01", "SpAur02", "SpAur03", "SpAur04",
"SpAur10", "SpAur11", "SpAur12", "SpAur13", "SpVir01", "SpVir02", "SpVir03",
"SpVir04", "SpVir05", "SpVir06", "SpVir07", "SpVir08", "SpVir09", "SpVir10",
"SpVir11", "AcCoe01", "AcCoe02", "AcCoe03", "AcCoe04", "AcCoe05", "AcCoe06",
"AcCoe07", "AcCoe08", "AcTra01", "AcTra02", "AcTra03", "AcTra04", "AcTra05",
"AcTra06", "AcTra07", "AcTra08", "AcTra09", "ScTae01", "ScTae02", "ScTae03",
"ScTae04", "ScTae05", "ScTae06", "ScTae07", "ScTae08", "ScTae09", "SpAur05",
"SpAur06", "SpAur07", "SpAur08", "SpAur09"), ps_slv)
ps_slv_base## phyloseq-class experiment-level object
## otu_table() OTU Table: [ 12479 taxa and 50 samples ]
## sample_data() Sample Data: [ 50 samples by 3 sample variables ]
## tax_table() Taxonomy Table: [ 12479 taxa by 8 taxonomic ranks ]
0K, three samples gone. But we probably lost some ASVs when use we removed samples. So we need to get rid of any ASVs that have now a total of 0 reads. This will be our working phyloseq object.
ps_slv_work <- prune_taxa(taxa_sums(ps_slv_base) > 0, ps_slv_base)
ps_slv_work## phyloseq-class experiment-level object
## otu_table() OTU Table: [ 12040 taxa and 50 samples ]
## sample_data() Sample Data: [ 50 samples by 3 sample variables ]
## tax_table() Taxonomy Table: [ 12040 taxa by 8 taxonomic ranks ]
Looks like 439 ASVs were only found in those three samples.
These samples are intestinal communities and we assume that Chloroplast are not contributing to metabolism. These data could be useful later but for now lets create a phyloseq object without Chloroplast.
WARNING: the subset_taxa command removes anything that is NA for the specified taxonomic level or above. For example, lets say you run the subset_taxa command using Order != "Chloroplast". Seems like you should get a phyloseq object with everything except Chloroplast. But actually the command not only gets rid Chloroplast but everything else that has NA for Order and above. In our experience this is not well documented and we had to dig through the files to figure out what was happening.
Our dataset has 590 Chloroplast ASVs and running the command as is removed an additional 1244 ASVs. So lets see if we can get rid of just Chloroplast ASVs without removing everything that is unclassified at Order and above. To do this, we subset the taxa to generate a ps object of just Chloroplast, selected the ASV column only, turned it into a factor, and used this to remove Chloroplast from the ps object.
# generate a file with Chloroplast ASVs
CH1 <- subset_taxa(ps_slv_work, Order == "Chloroplast")
CH1 <- as(tax_table(CH1), "matrix")
CH1 <- CH1[, 8]
CH1df <- as.factor(CH1)
goodTaxaCH <- setdiff(taxa_names(ps_slv_work), CH1df)
ps_slv_work_no_cyano <- prune_taxa(goodTaxaCH, ps_slv_work)
ps_slv_work_no_cyano## phyloseq-class experiment-level object
## otu_table() OTU Table: [ 11450 taxa and 50 samples ]
## sample_data() Sample Data: [ 50 samples by 3 sample variables ]
## tax_table() Taxonomy Table: [ 11450 taxa by 8 taxonomic ranks ]
This step removed 590 Chloroplast ASVs. Perfect.
And now we use the same approach to remove Mitochondria.
# generate a file with mitochondria ASVs
MT1 <- subset_taxa(ps_slv_work_no_cyano, Family == "Mitochondria")
MT1 <- as(tax_table(MT1), "matrix")
MT1 <- MT1[, 8]
MT1df <- as.factor(MT1)
goodTaxa <- setdiff(taxa_names(ps_slv_work_no_cyano), MT1df)
ps_slv_work_filt <- prune_taxa(goodTaxa, ps_slv_work_no_cyano)
ps_slv_work_filt## phyloseq-class experiment-level object
## otu_table() OTU Table: [ 11144 taxa and 50 samples ]
## sample_data() Sample Data: [ 50 samples by 3 sample variables ]
## tax_table() Taxonomy Table: [ 11144 taxa by 8 taxonomic ranks ]
Sweet, looks like this removed 306 Mitochondria ASVs. Next we generate some summary data for each sample.
One last thing to do is to create a merged phyloseq object grouped by host species. This will come in handy later for some analyses. Basically we collapsed all samples from the same host species together.
mergedGP <- merge_samples(ps_slv_work_filt, "Sp")
SD <- merge_samples(sample_data(ps_slv_work_filt), "Sp")
mergedGP## phyloseq-class experiment-level object
## otu_table() OTU Table: [ 11144 taxa and 5 samples ]
## sample_data() Sample Data: [ 5 samples by 3 sample variables ]
## tax_table() Taxonomy Table: [ 11144 taxa by 8 taxonomic ranks ]
sample_names(mergedGP)## [1] "AcCoe" "AcTra" "ScTae" "SpAur" "SpVir"
Great, still the same number of ASVs and now only 5 “samples” corresponding to the 5 species.
There are now the several phyloseq objects to chose from and, using the above methods, additional objects can easily be created.
ps_slv –> phyloseq dataset with all 53 samples, all ASVs.ps_slv_base –> phyloseq dataset with 50 samples, all ASVs (this is not very useful).ps_slv_work –> phyloseq dataset with 50 samples, 0 read ASVs removed.ps_slv_work_filt –> phyloseq dataset with 50 samples, ASVs from Mitochondria and Cyanobacteria removed.mergedGP –> ps_slv_work_filt phyloseq dataset collapsed by host species.Before we do anything else, lets generate summary data for each host. We can generate a summary report for any ps object but we will use the object with mitochondria and chlorplasts removed, as well as the low replicate host species removed. We will also add details about each host. The table is displayed below. We can use these data when we upload the original fastq files to sequence read archives. Later on we will also add alpha diversity stats and save the table.
total_reads <- sample_sums(ps_slv_work_filt)
total_reads <- as.data.frame(total_reads, make.names = TRUE)
total_reads <- total_reads %>% rownames_to_column("host_ID")
total_asvs <- estimate_richness(ps_slv_work_filt, measures = "Observed")
total_asvs <- total_asvs %>% rownames_to_column("host_ID")
sam_details <- sample_data(ps_slv)
sam_details <- sam_details %>% mutate(genus = case_when(
Gen == "Ac" ~ "Acanthurus",
Gen == "Sc" ~ "Scarus",
Gen == "Sp" ~ "Sparisoma"))
sam_details <- sam_details %>% mutate(species = case_when(
Sp == "AcCoe"~ "coeruleus",
Sp == "AcTra"~ "tractus",
Sp == "ScTae"~ "taeniopterus",
Sp == "SpAur"~ "aurofrenatum",
Sp == "SpVir"~ "viride"))
#Sp == "SpChr"~ "chrysopterum",
#Sp == "ScVet"~ "vetula"
sam_details <- sam_details %>% mutate(common_name = case_when(
Sp == "AcCoe" ~ "blue tang surgeonfish",
Sp == "AcTra" ~ "fiveband surgeonfish",
Sp == "ScTae" ~ "princess parrotfish",
Sp == "SpAur" ~ "redband parrotfish",
Sp == "SpVir" ~ "stoplight parrotfish"))
#Sp == "SpChr" ~ "redtail parrotfish",
#Sp == "ScVet" ~ "queen parrotfish"))
sam_details <- sam_details %>% mutate(NCBI_txid = case_when(
Sp == "AcCoe" ~ "157585",
Sp == "AcTra" ~ "1316013",
Sp == "ScTae" ~ "544418",
Sp == "SpAur" ~ "59663",
Sp == "SpVir" ~ "59666"))
#Sp == "SpChr" ~ "51766",
#Sp == "ScVet" ~ "84543"))
sam_details <- sam_details[-c(2,3)]
colnames(sam_details) <- c("host_ID", "host_genus", "host_species",
"full_name", "NCBI_txid")
merge_tab <- merge(sam_details, total_reads, by = "host_ID")
merge_tab2 <- merge(merge_tab, total_asvs, by = "host_ID")
colnames(merge_tab2) <- c("host_ID", "host_genus", "host_species",
"common_name", "NCBI_txid", "total_reads", "total_ASVs")
# We also have a datatable containing metrics for each host. Lets bring this in
# and merge with the summary table
metrics <- read.table("MANUAL_INPUT/host_metrics.txt", sep = "\t", header = TRUE)
host_details <- merge(merge_tab2, metrics, by = "host_ID")
colnames(host_details) <- c("host_ID", "host_genus", "host_species",
"common_name", "NCBI_txid", "total_reads", "total_ASVs",
"collection_date", "phase", "weight", "total_length",
"foregut_length", "midgut_length",
"hindgut_length", "total_gut_length")
datatable(host_details, rownames = FALSE, width = "100%", colnames = c(
"host_ID", "host_genus", "host_species", "common_name", "NCBI_txid",
"total_reads", "total_ASVs", "Collection_date", "Phase", "Weight (g)",
"Total length (cm)", "Fore gut length (cm)", "Mid gut length (cm)",
"Hind gut length (cm)", "Total gut length (cm)"),
caption = htmltools::tags$caption(style = "caption-side: bottom; text-align: left;",
"Table: ", htmltools::em("Sample summary.")), extensions = "Buttons",
options = list(columnDefs = list(list(className = "dt-left", targets = 0)),
dom = "Blfrtip", pageLength = 5, lengthMenu = c(5, 10, 25, 50), buttons = c("csv",
"copy"), scrollX = TRUE, scrollCollapse = TRUE))Now we have a nice little summary table about each sample—genus/species, common name, number of reads, number of ASVs, etc. All of this info can be used when submitting samples to sequence read archives. Once we conduct alpha diversity estimates below, we will add that data to the table above and export as Supplementary Table 3.
What are the dominant taxa in this system? How diverse are these communities? How similar are samples to each other?
Before we can start to understand a system, we need to know something about its parts. So lets start with a quick look at class-level diversity. Of course, you can change this to any taxonomic rank you wish. Here we created a sortable table that has the total number of reads and ASVs for each class
# generate the ASV table
tax_asv <- table(tax_table(ps_slv_work_filt)[, "Class"], exclude = NULL,
dnn = "Taxa")
tax_asv <- as.data.frame(tax_asv, make.names = TRUE)
# generate the reads table
tax_reads <- factor(tax_table(ps_slv_work_filt)[, "Class"])
tax_reads <- apply(otu_table(ps_slv_work_filt), MARGIN = 1, function(x)
{
tapply(x, INDEX = tax_reads, FUN = sum, na.rm = TRUE, simplify = TRUE)
})
tax_reads <- as.data.frame(tax_reads, make.names = TRUE)
tax_reads <- cbind(tax_reads, reads = rowSums(tax_reads))
tax_reads <- tax_reads[51]
tax_reads <- setDT(tax_reads, keep.rownames = TRUE)[]
# merge the two tables and make everything look pretty
# in an interactive table
taxa_read_asv_tab <- merge(tax_reads, tax_asv, by.x = "rn", by.y = "Taxa")
names(taxa_read_asv_tab) <- c("Taxa", "total reads", "total ASVs")
write.table(taxa_read_asv_tab, "R_OUTPUT/Supplementary_Table_4.txt", sep = "\t",
row.names = FALSE, quote = FALSE)
datatable(taxa_read_asv_tab, rownames = FALSE, width = "100%", colnames = c(
"Taxa", "total reads", "total ASVs"),
caption = htmltools::tags$caption(
style = "caption-side: bottom; text-align: left;",
"Supplementary Table 4: ", htmltools::em("Total reads & ASVs by Class")), extensions = "Buttons",
options = list(columnDefs = list(list(className = "dt-left", targets = 0)),
dom = "Blfrtip", pageLength = 5, lengthMenu = c(5, 10, 35, 70), buttons = c("csv",
"copy")))Looks like Proteobacteria, Firmicutes, Fusobacteria, Planctomycetes, and Bacteroidetes dominate in the read department. Curiously, Fusobacteria has comparatively low ASV richness.
Much of the analyses we do from here on out will be at the Class & Family levels. We chose not to focus on the Genus level because there simply is not enough resolution in our dataset to build a cohesive story. This is because these fish are (microbially) understudied and we are dealing with short read data. On the other hand, Phylum level is too coarse for groups like Proteobacteria and Firmicutes. Order did not provide any additional information and can be cumbersome for taxa with poorly resolved lineages. Depending on the dataset, you may want to change your strategy.
Lets take a closer look Class-level taxonomic content of these communities. There are numerous ways to do this but here we chose to collapse samples by host species and display the relative abundance of the most dominant taxa. We also generated an alternative view of taxonomic composition for individual samples—separate bar plots that are included as Supplementary Figure 1 (see below for code).
Stacked bar charts are not the best but we like them for a birds eye view of the data. Here we calculate the relative abundance of taxa for each host species at the Class level. It turns out this is not too easy in phyloseq and there is a lot of (messy) code.
# calculate the averages and merge by species
ps_slv_filt_AVG <- transform_sample_counts(ps_slv_work_filt, function(x) x/sum(x))
mergedGP_BAR <- merge_samples(ps_slv_filt_AVG, "Sp")
SD_BAR <- merge_samples(sample_data(ps_slv_filt_AVG), "Sp")
# merge taxa by rank. If you choose a different rank be sure to change
# the rank throughout this code chunk
mdata_phy <- tax_glom(mergedGP_BAR, taxrank = "Class", NArm = FALSE)
mdata_phyrel <- transform_sample_counts(mdata_phy, function(x) x/sum(x))
meltd <- psmelt(mdata_phyrel)
meltd$Class <- as.character(meltd$Class)
# calculate the total relative abundance for all taxa
means <- ddply(meltd, ~Class, function(x) c(mean = mean(x$Abundance)))
means$mean <- round(means$mean, digits = 8)
taxa_means <- means[order(-means$mean), ] # this order in decending fashion
taxa_means <- format(taxa_means, scientific = FALSE) # ditch the sci notation Since our goal is to generate a figure and we only have 9 colors, some taxa will need to be put into an Other category. We can define ‘Other’ however we like so lets take a look at the overall relative abundance of each Class.
datatable(taxa_means, rownames = FALSE, width = "65%", colnames = c("Class",
"mean"),
caption = htmltools::tags$caption(style =
"caption-side: bottom; text-align: left;", "Table 2: ",
htmltools::em("Class-level relative abundance.")),
extensions = "Buttons", options = list(columnDefs =
list(list(className = "dt-center", targets = "_all")),
dom = "Blfrtip", pageLength = 5, lengthMenu = c(5, 10, 50, 70),
buttons = c("csv", "copy")))write.table(taxa_means, "R_OUTPUT/class_rel_abund.txt", sep = "\t", row.names = FALSE,
quote = FALSE)Inspecting the table it looks like if we choose a cutoff of 2% (0.02) we get 9 taxa—sounds pretty good. The rest go into the ‘Other’ category. No matter what, we will always gloss over some groups using such a coarse approach. But as we will see later, some of these low abundance groups will reappear when we look at the level of individual ASVs.
Here we define the Other category.
# Here we conglomerate at 2%.
Other <- means[means$mean <= 0.02, ]$Class
# or you can chose specifc taxa like this
#Other_manual <- c("list", "taxa", "in", "this", "format")
length(Other)## [1] 63
At a 2% abundance cutoff, 63 Classes are grouped into the ‘Other’ category. Great, now we can craft the bar chart. And here are the taxa that will go into the chart:
meltd[meltd$Class %in% Other, ]$Class <- "Other"
samp_names <- aggregate(meltd$Abundance, by = list(meltd$Sample), FUN = sum)[, 1]
.e <- environment()
meltd[, "Class"] <- factor(meltd[, "Class"], sort(unique(meltd[, "Class"])))
meltd <- meltd[order(meltd[, "Class"]), ]
# Here we order Classes by the Phylum they belong to.
levels(meltd$Class)## [1] "Alphaproteobacteria" "Bacteroidia" "Clostridia"
## [4] "Deltaproteobacteria" "Erysipelotrichia" "Fusobacteriia"
## [7] "Gammaproteobacteria" "Other" "Oxyphotobacteria"
## [10] "Planctomycetacia"
meltd$Class <- factor(meltd$Class, levels = c("Bacteroidia", "Clostridia",
"Erysipelotrichia", "Fusobacteriia", "Alphaproteobacteria", "Deltaproteobacteria",
"Gammaproteobacteria", "Planctomycetacia", "Oxyphotobacteria", "Other")) It took some tweaking to get the bar chart to look just right—so there is a lot of code here—and it could most certainly be better. While we’re at it, we will also save a copy of the figure so we can tweak it later and make it look pretty.
fig2A <- ggplot(meltd, aes_string(x = "Sample", y = "Abundance", fill = "Class"),
environment = .e, ordered = TRUE, xlab = "x-axis label", ylab = "y-axis label")
fig2A <- fig2A + geom_bar(stat = "identity", position = position_stack(reverse = TRUE),
width = 0.95) + coord_flip() + theme(aspect.ratio = 1/2)
fig2A <- fig2A + scale_fill_manual(values = friend_pal)
fig2A <- fig2A + theme(axis.text.x = element_text(angle = 0, hjust = 0.45,
vjust = 1))
fig2A <- fig2A + guides(fill = guide_legend(override.aes = list(colour = NULL),
reverse = FALSE)) + theme(legend.key = element_rect(colour = "black"))
fig2A <- fig2A + labs(x = "Host species", y = "Relative abundance (% total reads)",
title = "Abundance of bacterial taxa across host species")
fig2A <- fig2A + theme(axis.line = element_line(colour = "black"), panel.grid.major = element_blank(),
panel.grid.minor = element_blank(), panel.border = element_rect(colour = "black",
fill = NA, size = 1))
fig2AFigure 2A
pdf("R_OUTPUT/Figure_2A.pdf")
fig2A
invisible(dev.off())Armed with a picture of taxonomic composition we can move on to diversity estimates.
We took two views of overall community diversity. Just click on a tab to see the code and results.
Alpha diversity describes the diversity in a sample or site. There are several alpha diversity metrics available in phyloseq: Observed, Chao1, ACE, Shannon, Simpson, InvSimpson, Fisher. Play around to see how different metrics change or confirm these results.
Here we want to know if diversity is significantly different across host species. In order to do that we need to know if we should run a parametric or non-parametric test, and for that we need to know if our data is normally distributed. Most of the ideas/code for alpha (and subsequent beta) diversity statistics come from this workshop tutorial by Kim Dill-McFarland and Madison Cox.
First we run the diversity estimates, add these data to our summary table, and save a copy of this table.
diversity <- estimate_richness(ps_slv_work_filt, measures=c(
"Observed", "Chao1", "ACE", "Shannon", "Simpson", "InvSimpson", "Fisher"))
diversity_calc <- diversity %>% rownames_to_column("host_ID")
# round values
diversity_calc[c(3,5,10)] <- round(diversity_calc[c(3,5,10)], 1)
diversity_calc[c(4,6,7,9)] <- round(diversity_calc[c(4,6,7,9)], 2)
diversity_calc[8] <- round(diversity_calc[8], 3)
host_summary <- merge(host_details, diversity_calc)
host_summary$Observed <- NULL
write.table(host_summary, "R_OUTPUT/Supplementary_Table_3.txt",
sep = "\t", row.names = FALSE, quote = FALSE)Next, we add the diversity estimates to our phyloseq object, and test if the data are normally distributed using Shapiro-Wilk Normality test. We will focus on the inverse Simpson and Shannon diversity estimates and Chao’s richness estimate but this approach can be used for any metric.
# Convert to ps object
sample_div <- sample_data(diversity)
# Create new ps object with diversity estimates added to sample_data
ps_slv_work_filt_div <- merge_phyloseq(ps_slv_work_filt, sample_div)
# Run Shapiro test
shapiro_test_Shan <- shapiro.test(sample_data(ps_slv_work_filt_div)$Shannon)
shapiro_test_invSimp <- shapiro.test(sample_data(ps_slv_work_filt_div)$InvSimpson)
shapiro_test_Chao1 <- shapiro.test(sample_data(ps_slv_work_filt_div)$Chao1)
shapiro_test_Observed <- shapiro.test(sample_data(ps_slv_work_filt_div)$Observed)Shapiro-Wilk Normality Test for Shannon index.
##
## Shapiro-Wilk normality test
##
## data: sample_data(ps_slv_work_filt_div)$Shannon
## W = 0.98228, p-value = 0.6513
Shapiro-Wilk Normality Test for inverse Simpson index.
##
## Shapiro-Wilk normality test
##
## data: sample_data(ps_slv_work_filt_div)$InvSimpson
## W = 0.60454, p-value = 2.276e-10
Shapiro-Wilk Normality Test for Chao1 richness estimator.
##
## Shapiro-Wilk normality test
##
## data: sample_data(ps_slv_work_filt_div)$Chao1
## W = 0.89305, p-value = 0.0002855
Shapiro-Wilk Normality Test for Observed ASV richness estimator.
##
## Shapiro-Wilk normality test
##
## data: sample_data(ps_slv_work_filt_div)$Observed
## W = 0.89591, p-value = 0.0003531
Ok, since the p-values are significant for the inverse Simpson, Chao richness, and Observed ASV richness we reject the null hypothesis that these data are normally distributed. However, the Shannon estimates appear normally distributed. So lets see if diversity is significantly different between host species based on the Shannon index.
Since the Shannon data is normally distributed we can test for significance using ANOVA (a parametric test).
sampledataDF <- data.frame(sample_data(ps_slv_work_filt_div))
aov.shannon = aov(Shannon ~ Sp, data = sampledataDF)
#Call for the summary of that ANOVA, which will include P-values
summary(aov.shannon)## Df Sum Sq Mean Sq F value Pr(>F)
## Sp 4 19.13 4.782 3.906 0.00833 **
## Residuals 45 55.10 1.224
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Ok, the results of the ANOVA are significant. Here we use the Tukey’s HSD (honestly significant difference) post-hoc test to determine which pairwise comparisons are different.
TukeyHSD(aov.shannon)## Tukey multiple comparisons of means
## 95% family-wise confidence level
##
## Fit: aov(formula = Shannon ~ Sp, data = sampledataDF)
##
## $Sp
## diff lwr upr p adj
## AcTra-AcCoe 0.7421379 -0.78560423 2.26987998 0.6431490
## ScTae-AcCoe 0.4727303 -1.05501185 2.00047236 0.9030495
## SpAur-AcCoe -0.9230990 -2.33591246 0.48971439 0.3551484
## SpVir-AcCoe 0.3323910 -1.12853187 1.79331396 0.9664219
## ScTae-AcTra -0.2694076 -1.75153517 1.21271993 0.9852679
## SpAur-AcTra -1.6652369 -3.02859596 -0.30187785 0.0096995
## SpVir-AcTra -0.4097468 -1.82289999 1.00340634 0.9218562
## SpAur-ScTae -1.3958293 -2.75918834 -0.03247023 0.0424253
## SpVir-ScTae -0.1403392 -1.55349237 1.27281396 0.9985589
## SpVir-SpAur 1.2554901 -0.03255018 2.54353034 0.0593081
Looks like Sparisoma aurofrenatum is significantly different from Scarus taeniopterus and Acanthurus tractus.
Now we can look at the results on the inverse Simpson diversity and Chao’s richness. Since host species is categorical, we use Kruskal-Wallis (non-parametric equivalent of ANOVA) to test for significance.
Kruskal-Wallis of inverse Simpson index.
#library(FSA)
#dunnTest(InvSimpson ~ Sp, data = sampledataDF, method="bh")
kruskal.test(InvSimpson ~ Sp, data = sampledataDF)##
## Kruskal-Wallis rank sum test
##
## data: InvSimpson by Sp
## Kruskal-Wallis chi-squared = 13.779, df = 4, p-value = 0.008034
Kruskal-Wallis of Chao1 richness estimator.
#dunnTest(Chao1 ~ Sp, data = sampledataDF, method="bh")
kruskal.test(Chao1 ~ Sp, data = sampledataDF)##
## Kruskal-Wallis rank sum test
##
## data: Chao1 by Sp
## Kruskal-Wallis chi-squared = 13.992, df = 4, p-value = 0.007321
Kruskal-Wallis of Observed ASV richness index.
#library(FSA)
#dunnTest(Observed ~ Sp, data = sampledataDF, method="bh")
kruskal.test(Observed ~ Sp, data = sampledataDF)##
## Kruskal-Wallis rank sum test
##
## data: Observed by Sp
## Kruskal-Wallis chi-squared = 14.153, df = 4, p-value = 0.006822
For the inverse Simpson, Chao1, and Observed richness the results of the Kruskal-Wallis rank sum test are significant. So we can look at pairwise comparisons using Wilcoxon rank sum test for post-hoc analysis.
Pairwise significance test for inverse Simpson index.
pairwise.wilcox.test(sampledataDF$InvSimpson, sampledataDF$Sp, p.adjust.method="fdr")##
## Pairwise comparisons using Wilcoxon rank sum test
##
## data: sampledataDF$InvSimpson and sampledataDF$Sp
##
## AcCoe AcTra ScTae SpAur
## AcTra 0.545 - - -
## ScTae 0.963 0.545 - -
## SpAur 0.042 0.042 0.108 -
## SpVir 0.545 0.789 0.545 0.004
##
## P value adjustment method: fdr
Pairwise significance test for Chao1 richness estimator.
pairwise.wilcox.test(sampledataDF$Chao1, sampledataDF$Sp, p.adjust.method="fdr")##
## Pairwise comparisons using Wilcoxon rank sum test
##
## data: sampledataDF$Chao1 and sampledataDF$Sp
##
## AcCoe AcTra ScTae SpAur
## AcTra 0.628 - - -
## ScTae 0.617 0.666 - -
## SpAur 0.030 0.030 0.019 -
## SpVir 0.666 0.628 0.304 0.046
##
## P value adjustment method: fdr
Pairwise significance test for Observed ASV richness index.
pairwise.wilcox.test(sampledataDF$Observed, sampledataDF$Sp, p.adjust.method="fdr")##
## Pairwise comparisons using Wilcoxon rank sum test
##
## data: sampledataDF$Observed and sampledataDF$Sp
##
## AcCoe AcTra ScTae SpAur
## AcTra 0.677 - - -
## ScTae 0.677 0.730 - -
## SpAur 0.026 0.026 0.019 -
## SpVir 0.730 0.677 0.304 0.039
##
## P value adjustment method: fdr
Again we see that only Sp. aurofrenatum is significantly different from the other hosts. For the inverse Simpson index, Sp. aurofrenatum is significantly different from three of the four host species and Chao1 richness estimator, Sp. aurofrenatum is significantly different from all other host species. Now we can plot the results. Click on the back to diversity tabs link to go to the alpha diversity plots.
Here we plot the results of Shannon diversity index. We will save a copy of the figure for later tweaking. We use the color palette described above to delineate host species.
fig2B <- plot_richness(ps_slv_work_filt, x = "Sp",
measures = c("Shannon"),
color = "Sp", nrow = 1)
fig2B <- fig2B + geom_boxplot() + geom_jitter(width = 0.05)
fig2B <- fig2B + scale_colour_manual(values = samp_pal) +
labs(x = "Host species",
y = "Diversity",
title = "Alpha diversity of bacterial
communities in herbivorous reef fish")
#fig2B + geom_boxplot(aes(colour = black))
#fig2B <- fig2B + theme_bw() + geom_point(size = 2.5, aes(color = Sp)) +
fig2BFigure 2B
pdf("R_OUTPUT/Figure_2B.pdf")
fig2B
invisible(invisible(dev.off()))Click on the back to diversity tabs to explore correlations with alpha diversity.
Beta diversity basically tells us how similar or dissimilar samples are to one another. Phyloseq offers several ordination methods and distance metrics. Here we use non metric multidimensional scaling (NMDS) coupled with Jensen–Shannon divergence. We also save a copy of the figure for later tweaking. To see the full output of the NMDS analysis, remove the results = 'hide' tag from the code chunk.
set.seed(3131)
ord.nmds.jsd_slv <- ordinate(ps_slv_work_filt, method = "NMDS", distance = "jsd")
stressplot(ord.nmds.jsd_slv)##
## Call:
## metaMDS(comm = ps.dist)
##
## global Multidimensional Scaling using monoMDS
##
## Data: ps.dist
## Distance: user supplied
##
## Dimensions: 2
## Stress: 0.1646318
## Stress type 1, weak ties
## Two convergent solutions found after 20 tries
## Scaling: centring, PC rotation
## Species: scores missing
We see that a convergent solution was reached around 20 iterations and our stress is below 0.20, meaning that 2-axes are sufficient to view the data. Generally, we are looking for stress values below 0.2. If the stress values are high, you may need to add more axes to the ordination. Lets visualize the plot.
fig2C <- plot_ordination(ps_slv_work_filt, ord.nmds.jsd_slv, color = "Sp",
label = "SamName", title = "Jensen-Shannon divergence")
fig2C <- fig2C + geom_point(size = 4) + geom_point(shape = 1,
size = 3.6, colour = "black", stroke = 0.75) # + xlim(-0.4, 0.4) + ylim(-0.4, 0.4)
fig2C <- fig2C + scale_colour_manual(values = samp_pal)
fig2C <- fig2C + theme(axis.line = element_line(colour = "black"), panel.background = element_blank(),
panel.grid.major = element_line("grey"), panel.grid.minor = element_line("grey"),
panel.border = element_rect(colour = "black", fill = NA, size = 1)) +
theme(legend.key = element_rect(colour = "black"))
fig2C <- fig2C + coord_fixed()
fig2C <- fig2C + stat_ellipse(type = "t") + theme_bw()
fig2CFigure 2C
pdf("R_OUTPUT/Figure_2C.pdf")
fig2C
invisible(dev.off())So we can see some clustering within groups and spread between groups, but this is not a test for statistical differences. Do microbial communities differ significantly by host taxa? Click on the back to diversity tabs to see how we employ some statistical tests of beta-diversity.
To test whether microbial communities differ by host species we can use permutational analysis of variance (PERMANOVA) or analysis of similarity (ANOSIM). PERMANOVA does not assume normality but does assume equal beta dispersion between groups. We will test beta dispersion below.
First we use the adonis function in vegan to run a PERMANOVA test. This will tell us whether host species have similar centroids or not.
set.seed(1911)
fish.jsd <- phyloseq::distance(ps_slv_work_filt, method = "jsd")
sampledf <- data.frame(sample_data(ps_slv_work_filt))
fish_adonis <- adonis(fish.jsd ~ Sp, data = sampledf, permutations = 1000)
fish_adonis##
## Call:
## adonis(formula = fish.jsd ~ Sp, data = sampledf, permutations = 1000)
##
## Permutation: free
## Number of permutations: 1000
##
## Terms added sequentially (first to last)
##
## Df SumsOfSqs MeanSqs F.Model R2 Pr(>F)
## Sp 4 4.4045 1.10113 16.52 0.59489 0.000999 ***
## Residuals 45 2.9995 0.06665 0.40511
## Total 49 7.4040 1.00000
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
These results indicate that centroids are significantly different across host species meaning that communities are different by host species.
We can also use the pairwiseAdonis package for pair-wise PERMANOVA analysis.
pairwise.adonis(fish.jsd, factors = sampledf$Sp, p.adjust.m = "bonferroni")Here we see again we see that communities are different by host species.
However, PERMANOVA assumes equal beta dispersion so we will use the betadisper function from the vegan package to calculate beta dispersion values.
beta_adonis <- betadisper(fish.jsd, sampledf$Sp, bias.adjust = TRUE)
beta_adonis##
## Homogeneity of multivariate dispersions
##
## Call: betadisper(d = fish.jsd, group = sampledf$Sp, bias.adjust =
## TRUE)
##
## No. of Positive Eigenvalues: 34
## No. of Negative Eigenvalues: 15
##
## Average distance to median:
## AcCoe AcTra ScTae SpAur SpVir
## 0.2980 0.3098 0.1845 0.1816 0.2570
##
## Eigenvalues for PCoA axes:
## PCoA1 PCoA2 PCoA3 PCoA4 PCoA5 PCoA6 PCoA7 PCoA8
## 1.8547 1.5977 0.9602 0.7234 0.5659 0.3413 0.2609 0.2427
And then a pair-wise Permutation test for homogeneity of multivariate dispersions using permutest (again from the vegan package).
permutest(beta_adonis, pairwise=TRUE, permutations=1000)##
## Permutation test for homogeneity of multivariate dispersions
## Permutation: free
## Number of permutations: 1000
##
## Response: Distances
## Df Sum Sq Mean Sq F N.Perm Pr(>F)
## Groups 4 0.14600 0.036500 4.1891 1000 0.007992 **
## Residuals 45 0.39209 0.008713
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Pairwise comparisons:
## (Observed p-value below diagonal, permuted p-value above diagonal)
## AcCoe AcTra ScTae SpAur SpVir
## AcCoe 0.8051948 0.0029970 0.0119880 0.2797
## AcTra 0.8147277 0.0159840 0.0139860 0.2557
## ScTae 0.0040024 0.0160930 0.9440559 0.0420
## SpAur 0.0125449 0.0160815 0.9425028 0.0699
## SpVir 0.2735320 0.2661694 0.0479863 0.0669727
These results are significant, meaning that host species have different dispersions. Looking at the pairwise p-values and permuted p-value, we see that the significant differences (p-value < 0.05) are between:
This means we are less confident that the PERMANOVA result is a real result, and that the result is possibly due to differences in group dispersions.
We can also use Analysis of Similarity (ANOSIM)—which does not assume equal group variances—to test whether overall microbial communities differ by host species.
spgroup <- get_variable(ps_slv_work_filt, "Sp")
fish_anosim <- anosim(distance(ps_slv_work_filt, "jsd"), grouping = spgroup)
summary(fish_anosim)##
## Call:
## anosim(x = distance(ps_slv_work_filt, "jsd"), grouping = spgroup)
## Dissimilarity:
##
## ANOSIM statistic R: 0.8544
## Significance: 0.001
##
## Permutation: free
## Number of permutations: 999
##
## Upper quantiles of permutations (null model):
## 90% 95% 97.5% 99%
## 0.0556 0.0758 0.0873 0.1111
##
## Dissimilarity ranks between and within classes:
## 0% 25% 50% 75% 100% N
## Between 94 462.50 726.5 977.25 1225 992
## AcCoe 36 185.50 290.5 360.00 582 28
## AcTra 14 162.25 322.0 510.50 789 36
## ScTae 9 34.75 74.5 122.00 419 36
## SpAur 1 30.50 76.0 137.50 562 78
## SpVir 13 95.00 177.0 273.00 584 55
And the AN0SIM result is significant meaning that host species influences microbial community composition.
At this point we have a good handle on the diversity of the intestinal microbiomes of these herbivorous reef fish. We know that communities are dominated by the same broad-level taxonomic groups. The beta diversity analysis demonstrates that communities partition along host species. Now we want to determine which ASVs are driving these patterns and assess their distribution in nature using publicly available data. To accomplish this task we leave the R environment and employ some additional tools. To summarize, our goals here are to:
- identify differentially abundant (DA) ASVs across host species,
- find closest database matches to DA ASVs, and
- perform phylogenetic reconstruction on DA ASVs and top hits.
We used LDA Effect Size (LEfSe) to identify differentially abundant (DA) ASVs across host species and the MicrobiomeAnalyst webserver to run the analysis. There are also many other great tools on the MicrobiomeAnalyst webserver by the way. We needed three files for the input—an ‘OTU’ table, a metadata file containing sample information, and a taxonomy table. We generate these tables with the code below.
########## OTU table
OTU1 <- as(otu_table(ps_slv_work_filt), "matrix")
# transpose if necessary
# Coerce to data.frame
OTUdf <- as.data.frame(t(OTU1))
setDT(OTUdf, keep.rownames = TRUE)[]
write.table(OTUdf, "R_OUTPUT/seq_tab_for_core.txt",
sep = "\t", row.names = FALSE, quote = FALSE)
colnames(OTUdf)[1] <- "#NAME"
write.table(OTUdf, "LEfSe_INPUT/LEfSe_INPUT_seq_tab.txt",
sep = "\t", row.names = FALSE, quote = FALSE)
############ TAX Table
# Remember in the `tax_table` we added the last columns as the actual sequence
# of each ASV and the ASV_ID. We do not need those here.
# So lets only keep the first 6 columns (the taxonomic lineage)
TAX1 <- as(tax_table(ps_slv_work_filt), "matrix")
TAXdf <- as.data.frame(TAX1)
setDT(TAXdf, keep.rownames = TRUE)[]
colnames(TAXdf)[1] <- "#TAXONOMY"
TAXdf <- TAXdf[, 1:6]
write.table(TAXdf, "LEfSe_INPUT/LEfSe_INPUT_tax_tab.txt",
sep = "\t", row.names = FALSE, quote = FALSE)
############ Metadata file
meta_file <- data.frame(NAME = sample_name, SampleType = species, Gen = genus)
rownames(meta_file) <- samples.out
colnames(meta_file) <- c("#NAME", "Species", "Genus" )
# but we still have those three samples that need to be removed
meta_file <- filter(meta_file, Species != "SpChr" & Species != "ScVet")
write.table(meta_file, "LEfSe_INPUT/LEfSe_INPUT_metadata.txt", sep = "\t", row.names = FALSE,
quote = FALSE)And once we have the three files, we head over to the MicrobiomeAnalyst webserver and upload the files. Be sure to select Silva taxonomy in the drop-down menu.
Check the data summary after uploading the files:
Cool, all looks good. Hit Proceed. Here are the settings we used for the different step:
Minimum count = 20, Prevalence in samples (%) = 20, and Percentage to remove (%) = 0. This removed 3796 low abundant ASVs.Data rarefying = Do not rarefy my data,Data scaling = Total sum scaling (TSS), andData transformation = Do not transform my data.Log LDA score = 4 & Adjusted p-value cutoff = 0.0001. We specifically chose these values because we found that they eliminated spurious results such as DA ASVs that were really abundant in a few samples but not consistent across an entire group.The result was 59 differentially abundant (DA) ASVs.
We can inspect and save the results of the LEfSe analysis. The table shows the Linear discriminant analysis (LDA) scores, P-values adjusted for multiple testing, and False Discovery Rate (FDR) values from the LEfSe analysis. Normalized read abundance values for each host species are also given.
lefse_tab <- read.table("MANUAL_INPUT/lefse_results.txt", header = TRUE,
sep = "\t", check.names = FALSE)
write.table(lefse_tab, "R_OUTPUT/Supplementary_Table_5.txt", quote = FALSE, sep = "\t", row.names = FALSE)
datatable(lefse_tab, rownames = FALSE, width = "100%", caption =
htmltools::tags$caption(style = "caption-side: bottom; text-align: left;",
"Table 4: ", htmltools::em("Results of LEfSe analysis.")), extensions = "FixedColumns",
"Buttons", options = list(columnDefs = list(list(className = "dt-center",
targets = c(1, 2, 3, 4, 5, 6, 7, 8))), dom = "Blfrtip", pageLength = 5,
lengthMenu = c(5, 10, 25, 60), buttons = c("csv", "copy"), scrollX = TRUE,
scrollCollapse = TRUE))Before getting knee deep in the DA ASV analysis lets see if we can’t identify some core elements, or ASVs, to these fish. First we need a mothur-formatted .shared file.
cm <- read.table("R_OUTPUT/seq_tab_for_core.txt", sep = "\t", header = TRUE, row.names=1)
cm_t <- t(cm)
class(cm_t)## [1] "matrix"
cm_df <- as.data.frame(cm_t)
class(cm_df)## [1] "data.frame"
numcols <- ncol(cm_df)
cm_df <- cm_df %>% tibble::rownames_to_column("Group")
cm_df <- cm_df %>% mutate(label = 0.03, numOtus = numcols) %>% select(label, Group, numOtus, everything())
write.table(cm_df, "R_OUTPUT/ps_slv_work_filt.txt", quote = FALSE, sep = "\t", row.names = FALSE)
####COMBINE by fish species
cm_Merge <- cm %>% tibble::rownames_to_column("ASV")
cm_Merge <- cm_Merge %>% mutate(AcCoe =
AcCoe01+AcCoe02+AcCoe03+AcCoe04+AcCoe05+
AcCoe06+AcCoe07+AcCoe08)
cm_Merge <- cm_Merge %>% mutate(AcTra =
AcTra01+AcTra02+AcTra03+AcTra04+AcTra05+
AcTra06+AcTra07+AcTra08+AcTra09)
cm_Merge <- cm_Merge %>% mutate(ScTra =
ScTae01+ScTae02+ScTae03+ScTae04+ScTae05+
ScTae06+ScTae07+ScTae08+ScTae09)
cm_Merge <- cm_Merge %>% mutate(SpAur =
SpAur01+SpAur02+SpAur03+SpAur04+SpAur05+
SpAur06+SpAur07+SpAur08+SpAur09+SpAur10+
SpAur11+SpAur12+SpAur13)
cm_Merge <- cm_Merge %>% mutate(SpVir =
SpVir01+SpVir02+SpVir03+SpVir04+SpVir05+
SpVir06+SpVir07+SpVir08+SpVir09+SpVir10+
SpVir11)
cm_Merge <- cm_Merge %>% select(ASV, AcCoe, AcTra, ScTra, SpAur, SpVir)
cm_Merge_2 <- cm_Merge[,-1]
rownames(cm_Merge_2) <- cm_Merge[,1]
cm_Merge_2_t <- t(cm_Merge_2)
class(cm_Merge_2_t)## [1] "matrix"
cm_Merge_2_df <- as.data.frame(cm_Merge_2_t)
class(cm_Merge_2_df)## [1] "data.frame"
cm_Merge_2_df <- cm_Merge_2_df %>% tibble::rownames_to_column("Group")
cm_Merge_2_df <- cm_Merge_2_df %>% mutate(label = 0.03, numOtus = numcols) %>% select(label, Group, numOtus, everything())
write.table(cm_Merge_2_df, "R_OUTPUT/ps_slv_work_filt_combine.txt", quote = FALSE, sep = "\t", row.names = FALSE)Next we use the output to run get.coremicrobiome in mothur.
Next we wanted to know where else these ASVs had been detected in nature. There is a huge wealth of publicly available sequence information from many studies and habitats. We can use this information to get a better idea of the distribution and habitat specificity of the DA ASVs. To accomplish this we performed the following steps:
First we needed a phyloseq object that only contained the DA ASVs. To do this, we passed an object consisting of just these 59 ASVs (from the LEfSe analysis) to the phyloseq function prune_taxa. We needed two different ps objects, one from the unmerged object (ps_slv_work_filt) and the other from the merged-by-genus object (mergedGP).
Next we needed a fasta file of our DA ASVs. We could not find an easy way to export a fasta file from the new ps objects. So we tried this using the tax_table. This approach works but, well, it is not very elegant. If you want a fasta file from any other ps objects just swipe out the name of the ps object in the code below. Anyway, we will generate and save a fasta file.
Line Break Type as Legacy Mac(CR). This may be incompatible with other programs and needs to be changed to UNIX (LS).# Object of DA ASVs
lefse_asvs <- c("ASV21", "ASV25", "ASV35", "ASV44", "ASV159", "ASV17",
"ASV174", "ASV22", "ASV234", "ASV60", "ASV114", "ASV268", "ASV14",
"ASV226", "ASV23", "ASV29", "ASV30", "ASV48", "ASV90", "ASV98", "ASV18",
"ASV41", "ASV7", "ASV43", "ASV5", "ASV54", "ASV8", "ASV9", "ASV250",
"ASV12", "ASV32", "ASV34", "ASV39", "ASV224", "ASV398", "ASV127", "ASV128",
"ASV151", "ASV323", "ASV359", "ASV374", "ASV91", "ASV56", "ASV6", "ASV165",
"ASV284", "ASV395", "ASV450", "ASV15", "ASV2", "ASV20", "ASV298", "ASV57",
"ASV69", "ASV75", "ASV82", "ASV1", "ASV70", "ASV49")
# Create ps objects
da_asvs <- prune_taxa(lefse_asvs, mergedGP)
da_asvs_full <- prune_taxa(lefse_asvs, ps_slv_work_filt)
# Create fasta file from tax_table
table2format <- tax_table(da_asvs)
#retain only the column with the sequences
table2format_trim <- table2format[, 7]
table2format_trim_df <- data.frame(row.names(table2format_trim), table2format_trim)
colnames(table2format_trim_df) <- c("ASV_ID", "ASV_SEQ")
table2format_trim_df$ASV_ID <- sub("ASV" , ">ASV", table2format_trim_df$ASV_ID) #format fasta
write.table(table2format_trim_df, "R_OUTPUT/ASV_FOR_BLAST.fasta", sep = "\r",
col.names = FALSE, row.names = FALSE, quote = FALSE, fileEncoding = "UTF-8")Here are the top BLAST hits for each DA ASV. The table displays a lot of information about each BLAST search (it scrolls along the x-axis by the way). Most importantly are the accession numbers of top BLAST hits (subject acc.var), number of 100% identical matches (num perfect hits), the percent identity, and some info on where/when the hit sequence was originally found. Where applicable, there is also PubMedIDs so you can find the paper that reported the sequence. Looking at this table will give you a preliminary sense of the ecology of these ASVs. For example, most hits come from intestinal communities, many of which are marine herbivorous fish. But the low percent identity of several ASVs indicates that these sequences have been poorly sampled. This is not surprising given the geographic skew of sampling.
Note This table also scrolls horizontally.
blast_tab <- read.table("MANUAL_INPUT/BLAT_results.txt", header = TRUE,
sep = "\t", check.names = FALSE)
write.table(blast_tab, "R_OUTPUT/Supplementary_Table_6.txt", row.names = FALSE, sep = "\t", quote = FALSE)
datatable(blast_tab, rownames = FALSE, caption = htmltools::tags$caption(style = "caption-side: bottom; text-align: left;",
"Supplementary Table 6: ", htmltools::em("Results of BLASTn analysis.")), extensions = "FixedColumns",
"Buttons", options = list(columnDefs = list(list(className = "dt-center",
targets = c(1, 2, 3, 4, 5, 6, 7, 8))), dom = "Blfrtip", pageLength = 5,
lengthMenu = c(5, 10, 25, 65), buttons = c("csv", "copy"), scrollX = TRUE,
scrollCollapse = TRUE))Several ASVs returned more than one match at 100%. For some ASVs, the 100% matches were from the same study/study organism, so we just selected one as the representative. ASVs 6, 12, 224, and 398 returned numerous 100% matches (out of 50 total). These data were impractical to summarize and not very informative anyway. So we elected to leave these data out. If you want to see what these hits are, just grab the ASV sequence and BLAST away.
Also, some ASVs shared the same to hit. Since the table is ‘by ASV’ we retained all duplicate hits.
Short read sequences are not ideal for phylogenetic analysis, but given the data we have, we felt this was a good place to start. This analysis required several steps.
mothur "#align.seqs(candidate=sequence_tree.fasta, template=silva.nr_v132.align, processors=20, flip=t)"
mothur "#filter.seqs(fasta=sequence_tree.fasta, vertical = =F, hard=mask.txt)"
mothur "#classify.seqs(fasta=sequence_tree.filter.fasta, template=silva.nr_v132.align, taxonomy=silva.nr_v132.tax, processors=10)"
raxmlHPC-PTHREADS-SSE3 -T 24 -f a -p 2345 -x 3456 -m GTRGAMMA -N 1000 -s sequence_tree.filter.fasta -n fish_arb_align_1000BS_B.tre
Now it was time to put the pieces together and determine…
What these patterns tell us about specificity?
To dig just a little deeper, we screened our DA ASVs against the IMNGS database. IMNGS hosts a curated database of short-read sequences scraped from the International Nucleotide Sequence Database Collaboration (GenBank, DDBJ and EMBL). The database is rebuilt monthly and at the time of this analysis, contained 271,237 samples. IMNG is really designed to screen full-length 16S rRNA sequences and is not ideal for shorter reads. This is because the database is built from short reads, and different studies target different regions of the gene. For example, if we get no hits to an ASV it could mean the organisms it came from has really not been detected before or that it is in the database but is represented by a different 16S region. So take these data with a grain of salt.
IMNGS is not a high-throughput system. User may only submit a maximum of 10 sequences per query and this can (and will) take weeks to run. So choose your ASVs carefully.
IMNGS will return a lot of useful data for each query sequence. All we were interested in here was the number of hits, but there much more really useful data here. Among other data products, IMNGS returns report tables that tally the number of samples that were positive for the presence of query-like sequences for each sample category—categories like shrimp gut metagenome and seawater metagenome. Each category has a number of short-read samples, which could originate from a single study or multiple studies.
A report includes values for several percent identity cutoff values. We set a minimum threshold at 97% so our reports have values for 97 and 99%. You can set the threshold as low as 90%.
IMNGS provides three such reports based on the abundance of your query sequence.
An SRA-derived sample is considered positive if the query-like sequences sum up to more than 0% of the total number of sequences in that sample (i.e. any abundance).
An SRA-derived sample is considered positive if the query-like sequences sum up to more than 0.1% of the total number of sequences in that sample (i.e. excluding rare abundances).
An SRA-derived sample is considered positive if the query-like sequences sum up to more than 1% of the total number of sequences in that sample (i.e. including only dominant OTUs).
We report data from 97% cutoff identity and 0.1% of total reads in a sample. I think this is a little confusing so let me explain by example. In the tree below, ASV398 is most closely related to an Alphaproteobacteria associated with the toxic benthic marine dinoflagellate, Ostreopsis ovata. We retrieved this sequence during the BLASTn analysis discussed above. Anyway, ASV398 was screened against IMNGS and returned 2460 hits. This means that at 97% identity, 2460 samples had an ASV398-like sequence comprising greater than 0.1% of a given samples total number of sequences. If for example we increase the percent identity to 99% the number of sample hits drops to 138. If instead we look at the 0% report (97% identity), the number of sample hits increases to 6323.
We also compared the final list of top hits to the Sullam et. al. paper, specifically Table S1 from that paper. Because this paper was published in 2012, there were many sequence hits in our db that did not appear in the original paper. However, for those that did, we added the Sullam lifestyle category designations to the tree metadata.
We took all of these data and used iTOL to visualize the tree. For each top hit, we added the isolation source/natural host information, taxonomic affiliation, and Sullam lifestyle category. We also overlaid the number of hits to the IMNGS database for each ASV.
To view a full, interactive version of the tree go this iTOL page.
We used the tree to infer the lifestyle category of each ASVs based on the closest relatives in their clade. This was not a quantitative, but rather a user guided determination. Aside from MetaMetaDb (discussed above) we are not aware of any tool currently available to quantitatively assess habitat preference of a 16S rRNA sequence.
For simplicity we focused on three lifestyle catagories (though we have seven categories in the tree). Our reasoning—again based on the work of Sullam et. al.—was that fish intestines contain microbes that come from the environment (what they eat, where they live), microbes that are there because fish are animals with guts, and microbes that are there because fish are fish and have a physiology and evolutionary history that select specific organisms. Sullam et.al. were also looking at fish from different habitats (freshwater, estuary, marine) and tropic levels (carnivores, herbivores, omnivores) while our study was more narrow in scope.
We combined these habitat predictions with the results of the BLASTn analysis, scan of the IMNGS database, Sullam lifestyle categories, etc. and put it all in one editable table. So if you disagree with a habitat prediction, you can feel free to change it.
Description of table headings
Note This table also scrolls horizontally.
habi_tab <- read.table("MANUAL_INPUT/habitat_specificity.txt", header = TRUE,
sep = "\t", check.names = FALSE)
habi_tab2 <- habi_tab[order(habi_tab$habitat_code, habi_tab$Enriched), ] # order by habitat and host enriched
#habi_tab <- habi_tab[, -2] #delete code column
datatable(habi_tab2, rownames = FALSE, colnames = c(
"ASV", "Habitat code", "Putative habitat", "Enriched",
"Taxon", "Closest db match", "Per identity", "Subject acc",
"IMNGS hits", "Sullam lifestyle", "Num perfect hits"),
editable = TRUE,
caption = htmltools::tags$caption(style = "caption-side: bottom; text-align: left;",
"Table 6: ", htmltools::em("Assessing habitat specificity.")),
extensions = "Buttons",
options = list(columnDefs = list(list(className = "dt-center",
targets = c(1, 2, 3, 4, 5, 6, 7, 8, 9, 10))),
dom = "Blfrtip", pageLength = 5, lengthMenu = c(5, 10, 25, 60),
buttons = c("csv", "copy"), scrollX = TRUE, scrollCollapse = TRUE))write.table(habi_tab2, "R_OUTPUT/habitat_specificity.txt", sep = "\t",
col.names = TRUE, row.names = TRUE, quote = FALSE, fileEncoding = "UTF-8")NR Indicates Not Recorded. Four ASVs had numerous hits at 100% identity. We did not include top hit data for these ASVs.
Now we can summarize the data for each lifestyle category. This table was constructed in a text file and read into R.
habi_summary <- read.table("MANUAL_INPUT/Table_1.txt", header = TRUE,
sep = "\t", check.names = FALSE)
datatable(habi_summary, rownames = FALSE, editable = TRUE,
caption = htmltools::tags$caption(style = "caption-side: bottom; text-align: left;",
"Table 1: ", htmltools::em("Summary of habitat specificity.")),
extensions = "Buttons",
options = list(columnDefs = list(list(className = "dt-center",
targets = c(1, 2, 3, 4, 5))),
dom = "Brti",
buttons = c("csv", "copy"), scrollX = TRUE, scrollCollapse = TRUE))So we know that Host X is enriched for some ASV from taxa Y. Is this part of a larger pattern or an isolated case? For a given taxonomic group and rank, what proportion of total reads (from all ASVs) were found in a particular host species? At some point it would be nice if this were an interactive step, but for now we must modify the code below to look at different taxa. This example will look at the family Desulfovibrionaceae (Deltaproteopbacteria)
calc_tax_prop <- subset_taxa(mergedGP, Family == "Desulfovibrionaceae") # Change this to select different taxa
calc_tax_prop## phyloseq-class experiment-level object
## otu_table() OTU Table: [ 71 taxa and 5 samples ]
## sample_data() Sample Data: [ 5 samples by 3 sample variables ]
## tax_table() Taxonomy Table: [ 71 taxa by 8 taxonomic ranks ]
sample_sums_by_taxa <- sample_sums(calc_tax_prop)
total_taxa_reads <- sum(sample_sums_by_taxa)
sample_sums_by_taxa <- as.data.frame(sample_sums_by_taxa)
sample_sums_by_taxa$proportion <-
(sample_sums_by_taxa$sample_sums_by_taxa/total_taxa_reads) * 100
colnames(sample_sums_by_taxa) <- c("total taxa reads", "Proportion")
sample_sums_by_taxa$Proportion <- round(sample_sums_by_taxa$Proportion, digits = 2)
total_taxa_reads## [1] 206831
sample_sums_by_taxaGreat. Looks like there are 71 Desulfovibrionaceae ASVs and the majority (> 90%) of the reads are from Acanthurus. This is interesting. We can do this with any taxa we wish.
Also, for the record, here is the proportion of Cyanobacteria reads by host species. We used the code above and the original phyloseq object before removing the Cyanobacteria. There were a total of 1074 ASVs.
| Host species | total taxa reads | Proportion |
|---|---|---|
| AcCoe | 117909 | 53.27 |
| AcTra | 43261 | 19.54 |
| ScTae | 31013 | 14.01 |
| SpAur | 14534 | 6.57 |
| SpVir | 14636 | 6.61 |
At this point we know which ASVs are enriched in which host species, the lineage of those ASVs, and something about where else these sequences have been detected in nature. Next we would like to know the proportion of total reads for each ASV that is found in each host species. We start with a summary table of these data.
# calculate the averages and merge by species
# grab the da_asv ps object & merge by samples
daASV_mergedGP_BAR <- merge_samples(da_asvs_full, "Sp")
#daASV_SD_BAR <- merge_samples(sample_data(da_asvs_full), "Sp")
# calculate percent proportion
daASV_AVG <- apply(t(otu_table(daASV_mergedGP_BAR)), 1, function(x) x/sum(x))
# transpose
daASV_t_AVG <- t(daASV_AVG)
daASV_t_AVG_df <- as.data.frame(daASV_t_AVG)
######################
da_ASV_tax <- habi_tab[c("ASV", "Taxon", "Putative_habitat")] # choose columns of interest
da_ASV_tax2 <- da_ASV_tax[,-1]
rownames(da_ASV_tax2) <- da_ASV_tax[,1]
######################
######################
# combine based on ASV column
daASV_work <- merge(daASV_t_AVG_df, da_ASV_tax2, by = 0, all = TRUE)
rownames(daASV_work) <- daASV_work[,1]
daASV_work$Row.names <- NULL
# then make column row.names
daASV_work2 <- cbind(ASV = rownames(daASV_work), daASV_work)
#melt the df
daASV_work3 <- melt(daASV_work2, value.name = "ASV") # wide to long format?
colnames(daASV_work3) <- c("ASV", "Taxon", "Putative_habitat", "Sample", "Proportion")
daASV_work3$Proportion <- round(daASV_work3$Proportion, digits = 5)
datatable(daASV_work3, rownames = TRUE, editable = FALSE,
caption = htmltools::tags$caption(style = "caption-side: bottom; text-align: left;",
"Table 8: ", htmltools::em("DA ASV sample proportion.")),
extensions = "Buttons",
options = list(columnDefs = list(list(className = "dt-center",
targets = c(1, 2, 3, 4, 5))),
dom = "Blfrtip", pageLength = 5, lengthMenu = c(5,10, 50, 100, 300),
buttons = c("csv", "copy"), scrollX = TRUE, scrollCollapse = TRUE))#write.table(daASV_work3, "R_OUTPUT/Table_18.txt",
# sep = "\t", row.names = FALSE, quote = FALSE)Now that we have a list of DA ASVs and their assigned habitat preference, we want to create a R object that organizes these in some logical fashion. We can then use this object to order subsequent graphs and tables. So lets order the ASVs by putative habitat preference and then by the host species in which that ASV was enriched. Seems reasonable enough? Depending on the R command, some objects need to be in ascending order, others in descending order.
asv_order <- c("ASV450", "ASV165", "ASV395", "ASV284", "ASV56", "ASV6",
"ASV359", "ASV128", "ASV127", "ASV91", "ASV374", "ASV151",
"ASV323", "ASV398", "ASV224", "ASV39", "ASV34", "ASV12",
"ASV32", "ASV250", "ASV43", "ASV54", "ASV9", "ASV5", "ASV49",
"ASV8", "ASV41", "ASV18", "ASV7", "ASV90", "ASV29", "ASV98",
"ASV23", "ASV30", "ASV226", "ASV48", "ASV70", "ASV1", "ASV14",
"ASV298", "ASV82", "ASV75", "ASV69", "ASV57", "ASV20", "ASV15",
"ASV2", "ASV268", "ASV114", "ASV234", "ASV174", "ASV60", "ASV17",
"ASV22", "ASV159", "ASV44", "ASV25", "ASV21", "ASV35")
asv_order_rev <- rev(asv_order)Lets see if we can overlay all of this information in one “easy” to understand plot. The first thing is to do is plot the proportion of reads for a given ASV from each host species.
daASV_work3$ASV <- as.character(daASV_work3$ASV)
daASV_work3$ASV <- factor(daASV_work3$ASV, levels = unique(daASV_work3$ASV))
daASV_work3$ASV <- factor(daASV_work3$ASV, levels= asv_order)Next, we created a bar plot of read proportion by host species for each ASV. And save a copy to the FIGURES/ directory.
Code for proportional bar chart
#Bar charts
ASV_bar <- ggplot(daASV_work3, aes_string(x = "ASV",
y = "Proportion", fill = "Sample"), environment = .e, ordered = TRUE, xlab = "x-axis label",
ylab = "y-axis label")
ASV_bar <- ASV_bar + geom_bar(stat = "identity", position = position_stack(reverse = TRUE),
width = 0.95) + coord_flip() + theme(aspect.ratio = 2/1)
ASV_bar <- ASV_bar + scale_fill_manual(values = samp_pal)
ASV_bar <- ASV_bar + theme(axis.text.x = element_text(angle = 0, hjust = 0.95,
vjust = 1))
ASV_bar <- ASV_bar + guides(fill = guide_legend(override.aes = list(colour = NULL),
reverse = FALSE)) + theme(legend.key = element_rect(colour = "black"))
ASV_bar <- ASV_bar + labs(x = "Host species", y = "Proportion (% total reads)",
title = "ASV Proportion by host species")
ASV_bar <- ASV_bar + theme(axis.line = element_line(colour = "black"), panel.grid.major = element_blank(),
panel.grid.minor = element_blank(), panel.border = element_rect(colour = "black",
fill = NA, size = 1))
ASV_barpdf("R_OUTPUT/bar_plot_read_prop.pdf")
ASV_bar
invisible(dev.off())Code for heatmap
# Heatmap
library(ComplexHeatmap)
library(circlize)
library(heatmap3)
library(gdata)
fig4_heat <- as.data.frame(t(otu_table(da_asvs)))
fig4_tax <- as.data.frame(habi_tab2) # Convert habi_table to df and store in new variable
fig4_tax_tab <- fig4_tax[, -1] # eliminate 1st column so can combine based on row names
rownames(fig4_tax_tab) <- fig4_tax[, 1] # Make new row.names from original table
fig4_tax_tab <- fig4_tax_tab[c(4,2,3,1,5,6,7,8)] # Reorder
fig4_tax_tab <- fig4_tax_tab[c(1:4)] # Select columns
# Combine the two df by rowname
# If the matching involved row names, an extra character column called Row.names
#is added at the left, and in all cases the result has ‘automatic’ row names.
fig4_heatmap2 <- merge(fig4_tax_tab, fig4_heat, by = 0, all = TRUE)
fig4_heatmap <- subset(fig4_heatmap2, select=-c(Row.names))
rownames(fig4_heatmap) <- fig4_heatmap2[,'Row.names']
fig4_heatmap <- cbind(ASV = rownames(fig4_heatmap), fig4_heatmap) # make rownames a column
fig4_heatmap$ASV <- factor(fig4_heatmap$ASV, levels=rev(asv_order))
fig4_heatmap <- fig4_heatmap[order(fig4_heatmap$ASV),]
fig4_heatmap$ID <- paste(fig4_heatmap$ASV, fig4_heatmap$Taxon, fig4_heatmap$Putative_habitat, fig4_heatmap$Enriched, sep = "_") # combine the columns to make one name
fig4_heatmap2 <- fig4_heatmap[-c(1:5)] # delete the original columns
fig4_heatmap2 <- fig4_heatmap2[c(6,1,2,3,4,5)] # reorder
rownames(fig4_heatmap2) <- fig4_heatmap2[, 1]
fig4_heatmap2 <- fig4_heatmap2[-1]####Define Colors
taxa_colors <- unlist(lapply(row.names(fig4_heatmap2), function(x){
if(grepl
# environmental
("Alphaproteobacteria", x)) '#000000'
else if(grepl("Pirellulaceae", x)) '#000000'
else if(grepl("Rubritaleaceae", x)) '#000000'
else if(grepl("Flavobacteriaceae", x)) '#000000'
# fish/animal
else if(grepl("Desulfovibrionaceae", x)) '#0072b2'
else if(grepl("Lachnospiraceae", x)) '#f0e442'
else if(grepl("Erysipelotrichaceae", x)) '#009e73'
else if(grepl("Ruminococcaceae", x)) '#e69f00'
else if(grepl("Bacteroidales", x)) '#d55e00'
else if(grepl("Fusobacteriaceae", x)) '#56b4e9'
# Undertermined
else if(grepl("Vibrionaceae", x)) '#cc79a7'
# other
else if(grepl("Family_XIII", x)) '#808080'
else if(grepl("Mollicutes", x)) '#808080'
else if(grepl("Brevinemataceae", x)) '#808080'
else if(grepl("Peptostreptococcaceae", x)) '#808080'
}))
habitat_colors <- unlist(lapply(row.names(fig4_heatmap2), function(x){
if(grepl
("fish", x)) '#808080'
else if(grepl("animal", x)) '#000000'
else if(grepl("environmental", x)) '#808080'
else if(grepl("undetermined", x)) '#000000'
}))
heatColors <- cbind(taxa_colors, habitat_colors)
colnames(heatColors)[1] <- "Taxa"
colnames(heatColors)[2] <- "Habitat"
col <- colorRampPalette(bias = 1, c("#000033", "#66CCFF"))(16)
### Save heatmap
pdf(file = "R_OUTPUT/Figure_4.pdf")
heatmap3(fig4_heatmap2, cexRow=0.5, cexCol=1,
margins = c(3,13), RowSideColors=heatColors, scale="row", Colv = NA,
Rowv = NA, revC = TRUE, balanceColor = FALSE, col = col)
invisible(dev.off())
heatmap3(fig4_heatmap2,cexRow=0.5, cexCol=1,
margins = c(3,13), RowSideColors=heatColors, scale="row", Colv = NA,
Rowv = NA, revC = TRUE, balanceColor = FALSE, col = col)Here is code for other representation of taxa abundance. These are raw R images that have not been gussied up.
We will create two different representations of relative abundance for each sample (arranged by host species) by major Classes—facet grid box-and-whisker plots and bar charts. We will generate each separately (and save) using an earlier relative abundance phyloseq object and then display the combined output.
Code for box-and-whisker plot.
mdata_phy_all <- tax_glom(ps_slv_filt_AVG, taxrank = "Class", NArm = FALSE)
# You can choose any taxonomic level here
mdata_phyrel_all <- transform_sample_counts(mdata_phy_all, function(x) x/sum(x))
meltd_all <- psmelt(mdata_phyrel_all)
meltd_all$Class <- as.character(meltd_all$Class)
means <- ddply(meltd_all, ~Class, function(x) c(mean = mean(x$Abundance)))
taxa_means <- means[order(-means$mean), ] # decending order
taxa_means <- format(taxa_means, scientific = FALSE) # ditch the sci notation
Other <- means[means$mean <= 0.027, ]$Class # Here we conglomerate at 2%.
meltd_all[meltd_all$Class %in% Other, ]$Class <- "Other"
samp_names <- aggregate(meltd_all$Abundance, by = list(meltd_all$Sample),
FUN = sum)[, 1]
.e <- environment()
meltd_all[, "Class"] <- factor(meltd_all[, "Class"], sort(unique(meltd_all[,
"Class"])))
meltd_all <- meltd_all[order(meltd_all[, "Class"]), ]
levels(meltd_all$Class)
meltd_all$Class <- factor(meltd_all$Class, levels = c("Bacteroidia", "Clostridia",
"Erysipelotrichia", "Fusobacteriia", "Alphaproteobacteria", "Deltaproteobacteria",
"Gammaproteobacteria", "Planctomycetacia", "Other")) # Here we order Classes by the Phylum they belong to.
sup_fig1 <- qplot(data = meltd_all, x = Sp, y = Abundance, fill = Class,
geom = "boxplot", ylab = "Relative Abundance") + theme(legend.position = "bottom") +
facet_grid(Class ~ ., scales = "free_y", space = "free_y") + geom_jitter(width = 0.05) +
geom_point(colour = "black", fill = "white") #+ guides(guide_legend(reverse = FALSE) )
sup_fig1 <- sup_fig1 + scale_fill_manual(values = friend_pal) + labs(x = "Host species",
y = "Relative abundance (% total reads)")
sup_fig1pdf("R_OUTPUT/box-and-whisker.pdf")
sup_fig1
invisible(dev.off())Code for bar plot.
sup_fig2 <- ggplot(meltd_all, aes_string(x = "Sample", y = "Abundance",
fill = "Class"), environment = .e, Ordered = TRUE)
sup_fig2 <- sup_fig2 + geom_bar(stat = "identity", position = "stack") +
facet_grid(Class ~ Sp, scales = "free", space = "free")
sup_fig2 <- sup_fig2 + scale_fill_manual(values = friend_pal)
# sup_fig2 <- sup_fig2 + theme(axis.text.x = element_text(angle = -90,
# hjust = 0))
sup_fig2 <- sup_fig2 + theme(axis.text.x = element_blank())
sup_fig2 <- sup_fig2 + guides(fill = guide_legend(override.aes = list(colour = NULL),
reverse = FALSE)) + theme(legend.key = element_rect(colour = "black"),
legend.position = "bottom") +
labs(x = "Individual samples", y = "Relative abundance (% total reads)")
sup_fig2Supplementary Figure 1. Relative abundance of major Classes by sample
pdf("R_OUTPUT/Supplementary_Figure_1.pdf")
sup_fig2
invisible(dev.off())Specific tools
– phyloseq as the primary analytical package.
– LEfSe to identify differentially abundant (DA) amplicon sequence variants (ASV) across host fish species.
– MicrobiomeAnalyst to conduct LEfSe analysis.
– BLASTn and Silva ACT to identify closest hits to DA ASVs.
– RAxML for phylogenetic inference of DA ASVs and closest hits.
– iTOL for visualization of tree and associated metadata.
Other valuable resources
– R Markdown: The Definitive Guide
– knitr tutorials. Fantastic site!
– Microbiota analysis in R, UCR Workshop 2018 nicely documented workflow with examples.
It is now time to submit the data to your favorite sequence read archive. We submitted out data to the European Nucleotide Archive (ENA). The ENA does not like RAW data and prefers to have primers removed. So we submitted the trimmed Fastq files to the ENA. You can find these data under the study accession number PRJEB28397. The RAW files on our figshare site. TODO INSERT DOI/LINK
To submit to the ENA you need two data tables (plus your sequence data). The first file describes the samples and the second file describes the sequencing data. The original files can be found on the figshare site.
Below are the specific packages and versions used in this workflow using both sessionInfo() and devtools::session_info().
proc.time() - ptm## user system elapsed
## -8062.672 -117.683 -3055.536
sessionInfo()## R version 3.5.0 (2018-04-23)
## Platform: x86_64-apple-darwin15.6.0 (64-bit)
## Running under: macOS High Sierra 10.13.6
##
## Matrix products: default
## BLAS: /Library/Frameworks/R.framework/Versions/3.5/Resources/lib/libRblas.0.dylib
## LAPACK: /Library/Frameworks/R.framework/Versions/3.5/Resources/lib/libRlapack.dylib
##
## locale:
## [1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8
##
## attached base packages:
## [1] grid stats4 parallel stats graphics grDevices utils
## [8] datasets methods base
##
## other attached packages:
## [1] gdata_2.18.0 heatmap3_1.1.1
## [3] circlize_0.4.4 ComplexHeatmap_1.19.1
## [5] bindrcpp_0.2.2 ggthemes_4.0.1
## [7] pairwiseAdonis_0.0.1 cluster_2.0.7-1
## [9] plotly_4.8.0 RCurl_1.95-4.11
## [11] bitops_1.0-6 svgPanZoom_0.3.3
## [13] gridExtra_2.3 forcats_0.3.0
## [15] stringr_1.3.1 dplyr_0.7.6
## [17] purrr_0.2.5 readr_1.1.1
## [19] tidyr_0.8.1 tibble_1.4.2
## [21] tidyverse_1.2.1 formatR_1.5
## [23] pander_0.6.2 rmarkdown_1.10
## [25] DT_0.4 data.table_1.11.4
## [27] kableExtra_0.9.0 knitr_1.20
## [29] rstudioapi_0.7 reshape2_1.4.3
## [31] scales_1.0.0 vegan_2.5-2
## [33] lattice_0.20-35 permute_0.9-4
## [35] plyr_1.8.4 ggplot2_3.0.0
## [37] phyloseq_1.25.2 ShortRead_1.39.0
## [39] GenomicAlignments_1.17.3 SummarizedExperiment_1.11.6
## [41] DelayedArray_0.7.21 matrixStats_0.54.0
## [43] Biobase_2.41.2 Rsamtools_1.33.3
## [45] GenomicRanges_1.33.11 GenomeInfoDb_1.17.1
## [47] Biostrings_2.49.0 XVector_0.21.3
## [49] IRanges_2.15.16 S4Vectors_0.19.19
## [51] BiocParallel_1.15.8 BiocGenerics_0.27.1
## [53] dada2_1.8.0 Rcpp_0.12.18
##
## loaded via a namespace (and not attached):
## [1] colorspace_1.3-2 rjson_0.2.20 hwriter_1.3.2
## [4] rprojroot_1.3-2 GlobalOptions_0.1.0 base64enc_0.1-3
## [7] lubridate_1.7.4 xml2_1.2.0 codetools_0.2-15
## [10] splines_3.5.0 ade4_1.7-11 jsonlite_1.5
## [13] broom_0.5.0 shiny_1.1.0 compiler_3.5.0
## [16] httr_1.3.1 backports_1.1.2 assertthat_0.2.0
## [19] Matrix_1.2-14 lazyeval_0.2.1 cli_1.0.0
## [22] later_0.7.3 htmltools_0.3.6 tools_3.5.0
## [25] igraph_1.2.2 gtable_0.2.0 glue_1.3.0
## [28] GenomeInfoDbData_1.1.0 cellranger_1.1.0 multtest_2.37.0
## [31] ape_5.1 nlme_3.1-137 iterators_1.0.10
## [34] crosstalk_1.0.0 fastcluster_1.1.25 rvest_0.3.2
## [37] mime_0.5 gtools_3.8.1 zlibbioc_1.27.0
## [40] MASS_7.3-50 promises_1.0.1 hms_0.4.2
## [43] biomformat_1.8.0 rhdf5_2.25.4 RColorBrewer_1.1-2
## [46] yaml_2.2.0 latticeExtra_0.6-28 stringi_1.2.4
## [49] highr_0.7 foreach_1.4.4 shape_1.4.4
## [52] rlang_0.2.1 pkgconfig_2.0.1 evaluate_0.11
## [55] Rhdf5lib_1.2.1 bindr_0.1.1 labeling_0.3
## [58] htmlwidgets_1.2 tidyselect_0.2.4 magrittr_1.5
## [61] R6_2.2.2 pillar_1.3.0 haven_1.1.2
## [64] withr_2.1.2 mgcv_1.8-24 survival_2.42-6
## [67] modelr_0.1.2 crayon_1.3.4 GetoptLong_0.1.7
## [70] readxl_1.1.0 digest_0.6.15 xtable_1.8-2
## [73] httpuv_1.4.5 RcppParallel_4.4.1 munsell_0.5.0
## [76] viridisLite_0.3.0
devtools::session_info()## Session info -------------------------------------------------------------
## setting value
## version R version 3.5.0 (2018-04-23)
## system x86_64, darwin15.6.0
## ui X11
## language (EN)
## collate en_US.UTF-8
## tz America/Panama
## date 2018-11-24
## Packages -----------------------------------------------------------------
## package * version date
## ade4 1.7-11 2018-04-05
## ape 5.1 2018-04-04
## assertthat 0.2.0 2017-04-11
## backports 1.1.2 2017-12-13
## base * 3.5.0 2018-04-24
## base64enc 0.1-3 2015-07-28
## bindr 0.1.1 2018-03-13
## bindrcpp * 0.2.2 2018-03-29
## Biobase * 2.41.2 2018-07-18
## BiocGenerics * 0.27.1 2018-06-17
## BiocParallel * 1.15.8 2018-07-19
## biomformat 1.8.0 2018-05-01
## Biostrings * 2.49.0 2018-05-01
## bitops * 1.0-6 2013-08-17
## broom 0.5.0 2018-07-17
## cellranger 1.1.0 2016-07-27
## circlize * 0.4.4 2018-06-10
## cli 1.0.0 2017-11-05
## cluster * 2.0.7-1 2018-04-13
## codetools 0.2-15 2016-10-05
## colorspace 1.3-2 2016-12-14
## compiler 3.5.0 2018-04-24
## ComplexHeatmap * 1.19.1 2018-06-19
## crayon 1.3.4 2017-09-16
## crosstalk 1.0.0 2016-12-21
## dada2 * 1.8.0 2018-07-31
## data.table * 1.11.4 2018-05-27
## datasets * 3.5.0 2018-04-24
## DelayedArray * 0.7.21 2018-07-23
## devtools 1.13.6 2018-06-27
## digest 0.6.15 2018-01-28
## dplyr * 0.7.6 2018-06-29
## DT * 0.4 2018-01-30
## evaluate 0.11 2018-07-17
## fastcluster 1.1.25 2018-06-07
## forcats * 0.3.0 2018-02-19
## foreach 1.4.4 2017-12-12
## formatR * 1.5 2017-04-25
## gdata * 2.18.0 2017-06-06
## GenomeInfoDb * 1.17.1 2018-05-11
## GenomeInfoDbData 1.1.0 2018-06-01
## GenomicAlignments * 1.17.3 2018-07-18
## GenomicRanges * 1.33.11 2018-07-20
## GetoptLong 0.1.7 2018-06-10
## ggplot2 * 3.0.0 2018-07-03
## ggthemes * 4.0.1 2018-08-24
## GlobalOptions 0.1.0 2018-06-09
## glue 1.3.0 2018-07-17
## graphics * 3.5.0 2018-04-24
## grDevices * 3.5.0 2018-04-24
## grid * 3.5.0 2018-04-24
## gridExtra * 2.3 2017-09-09
## gtable 0.2.0 2016-02-26
## gtools 3.8.1 2018-06-26
## haven 1.1.2 2018-06-27
## heatmap3 * 1.1.1 2015-04-12
## highr 0.7 2018-06-09
## hms 0.4.2 2018-03-10
## htmltools 0.3.6 2017-04-28
## htmlwidgets 1.2 2018-04-19
## httpuv 1.4.5 2018-07-19
## httr 1.3.1 2017-08-20
## hwriter 1.3.2 2014-09-10
## igraph 1.2.2 2018-07-27
## IRanges * 2.15.16 2018-07-18
## iterators 1.0.10 2018-07-13
## jsonlite 1.5 2017-06-01
## kableExtra * 0.9.0 2018-05-21
## knitr * 1.20 2018-02-20
## labeling 0.3 2014-08-23
## later 0.7.3 2018-06-08
## lattice * 0.20-35 2017-03-25
## latticeExtra 0.6-28 2016-02-09
## lazyeval 0.2.1 2017-10-29
## lubridate 1.7.4 2018-04-11
## magrittr 1.5 2014-11-22
## MASS 7.3-50 2018-04-30
## Matrix 1.2-14 2018-04-13
## matrixStats * 0.54.0 2018-07-23
## memoise 1.1.0 2017-04-21
## methods * 3.5.0 2018-04-24
## mgcv 1.8-24 2018-06-18
## mime 0.5 2016-07-07
## modelr 0.1.2 2018-05-11
## multtest 2.37.0 2018-05-01
## munsell 0.5.0 2018-06-12
## nlme 3.1-137 2018-04-07
## pairwiseAdonis * 0.0.1 2018-09-09
## pander * 0.6.2 2018-07-08
## parallel * 3.5.0 2018-04-24
## permute * 0.9-4 2016-09-09
## phyloseq * 1.25.2 2018-07-15
## pillar 1.3.0 2018-07-14
## pkgconfig 2.0.1 2017-03-21
## plotly * 4.8.0 2018-07-20
## plyr * 1.8.4 2016-06-08
## promises 1.0.1 2018-04-13
## purrr * 0.2.5 2018-05-29
## R6 2.2.2 2017-06-17
## RColorBrewer 1.1-2 2014-12-07
## Rcpp * 0.12.18 2018-07-23
## RcppParallel 4.4.1 2018-07-19
## RCurl * 1.95-4.11 2018-07-15
## readr * 1.1.1 2017-05-16
## readxl 1.1.0 2018-04-20
## reshape2 * 1.4.3 2017-12-11
## rhdf5 2.25.4 2018-06-02
## Rhdf5lib 1.2.1 2018-05-17
## rjson 0.2.20 2018-06-08
## rlang 0.2.1 2018-05-30
## rmarkdown * 1.10 2018-06-11
## rprojroot 1.3-2 2018-01-03
## Rsamtools * 1.33.3 2018-07-20
## rstudioapi * 0.7 2017-09-07
## rvest 0.3.2 2016-06-17
## S4Vectors * 0.19.19 2018-07-18
## scales * 1.0.0 2018-08-09
## shape 1.4.4 2018-02-07
## shiny 1.1.0 2018-05-17
## ShortRead * 1.39.0 2018-06-13
## splines 3.5.0 2018-04-24
## stats * 3.5.0 2018-04-24
## stats4 * 3.5.0 2018-04-24
## stringi 1.2.4 2018-07-20
## stringr * 1.3.1 2018-05-10
## SummarizedExperiment * 1.11.6 2018-07-17
## survival 2.42-6 2018-07-13
## svgPanZoom * 0.3.3 2018-08-23
## tibble * 1.4.2 2018-01-22
## tidyr * 0.8.1 2018-05-18
## tidyselect 0.2.4 2018-02-26
## tidyverse * 1.2.1 2017-11-14
## tools 3.5.0 2018-04-24
## utils * 3.5.0 2018-04-24
## vegan * 2.5-2 2018-05-17
## viridisLite 0.3.0 2018-02-01
## withr 2.1.2 2018-03-15
## xml2 1.2.0 2018-01-24
## xtable 1.8-2 2016-02-05
## XVector * 0.21.3 2018-06-23
## yaml 2.2.0 2018-07-25
## zlibbioc 1.27.0 2018-05-01
## source
## CRAN (R 3.5.0)
## CRAN (R 3.5.0)
## CRAN (R 3.5.0)
## cran (@1.1.2)
## local
## cran (@0.1-3)
## cran (@0.1.1)
## cran (@0.2.2)
## Bioconductor
## Bioconductor
## Bioconductor
## Bioconductor
## Bioconductor
## CRAN (R 3.5.0)
## CRAN (R 3.5.0)
## CRAN (R 3.5.0)
## CRAN (R 3.5.0)
## CRAN (R 3.5.0)
## CRAN (R 3.5.0)
## CRAN (R 3.5.0)
## CRAN (R 3.5.0)
## local
## Bioconductor (R 3.5.0)
## CRAN (R 3.5.0)
## CRAN (R 3.5.0)
## Github (benjjneb/dada2@630ef9a)
## CRAN (R 3.5.0)
## local
## Bioconductor
## CRAN (R 3.5.0)
## CRAN (R 3.5.0)
## CRAN (R 3.5.1)
## CRAN (R 3.5.0)
## CRAN (R 3.5.0)
## cran (@1.1.25)
## CRAN (R 3.5.0)
## CRAN (R 3.5.0)
## CRAN (R 3.5.0)
## CRAN (R 3.5.0)
## Bioconductor
## Bioconductor
## Bioconductor
## Bioconductor
## CRAN (R 3.5.0)
## CRAN (R 3.5.0)
## CRAN (R 3.5.0)
## CRAN (R 3.5.0)
## CRAN (R 3.5.0)
## local
## local
## local
## CRAN (R 3.5.0)
## CRAN (R 3.5.0)
## CRAN (R 3.5.0)
## CRAN (R 3.5.0)
## CRAN (R 3.5.0)
## CRAN (R 3.5.0)
## cran (@0.4.2)
## cran (@0.3.6)
## CRAN (R 3.5.0)
## CRAN (R 3.5.0)
## CRAN (R 3.5.0)
## CRAN (R 3.5.0)
## CRAN (R 3.5.0)
## Bioconductor
## CRAN (R 3.5.0)
## CRAN (R 3.5.0)
## CRAN (R 3.5.0)
## cran (@1.20)
## CRAN (R 3.5.0)
## CRAN (R 3.5.0)
## CRAN (R 3.5.0)
## CRAN (R 3.5.0)
## CRAN (R 3.5.0)
## CRAN (R 3.5.0)
## CRAN (R 3.5.0)
## CRAN (R 3.5.0)
## CRAN (R 3.5.0)
## CRAN (R 3.5.0)
## CRAN (R 3.5.0)
## local
## CRAN (R 3.5.0)
## CRAN (R 3.5.0)
## CRAN (R 3.5.0)
## Bioconductor
## CRAN (R 3.5.0)
## CRAN (R 3.5.0)
## Github (pmartinezarbizu/pairwiseAdonis@c8037b7)
## CRAN (R 3.5.0)
## local
## CRAN (R 3.5.0)
## Bioconductor
## CRAN (R 3.5.0)
## CRAN (R 3.5.0)
## CRAN (R 3.5.0)
## CRAN (R 3.5.0)
## CRAN (R 3.5.0)
## cran (@0.2.5)
## CRAN (R 3.5.0)
## CRAN (R 3.5.0)
## CRAN (R 3.5.0)
## CRAN (R 3.5.0)
## CRAN (R 3.5.0)
## cran (@1.1.1)
## CRAN (R 3.5.0)
## CRAN (R 3.5.0)
## Bioconductor
## Bioconductor
## CRAN (R 3.5.0)
## CRAN (R 3.5.0)
## CRAN (R 3.5.0)
## cran (@1.3-2)
## Bioconductor
## CRAN (R 3.5.0)
## cran (@0.3.2)
## Bioconductor
## cran (@1.0.0)
## CRAN (R 3.5.0)
## CRAN (R 3.5.0)
## Bioconductor
## local
## local
## local
## CRAN (R 3.5.0)
## CRAN (R 3.5.0)
## Bioconductor
## CRAN (R 3.5.0)
## Github (timelyportfolio/svgPanZoom@c0b2a46)
## CRAN (R 3.5.0)
## cran (@0.8.1)
## cran (@0.2.4)
## CRAN (R 3.5.0)
## local
## local
## CRAN (R 3.5.0)
## CRAN (R 3.5.0)
## CRAN (R 3.5.0)
## cran (@1.2.0)
## CRAN (R 3.5.0)
## Bioconductor
## CRAN (R 3.5.0)
## Bioconductor
end_time <- Sys.time()
end_time - start_time## Time difference of 1.180127 mins